WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 : 

C12N 15/63, 15/31, C07K 14/245, C12N 
15/62, C12P 21/02 



Al 



(11) International Publication Number: WO 99/51753 

(43) International Publication Date: 14 October 1999 (14,10.99) 



(21) International Application Number: PCT/CA99/00272 

(22) International Filing Date: 29 March 1999 (29.03.99) 



(30) Priority Data: 

09/053,197 
09/085,761 



1 April 1998 (01.04.98) US 
28 May 1998 (28.05.98) US 



(63) Related by Continuation (CON) or Continuation-in-Part 
(CIP) to Earlier Application 

US 09/085,761 (CIP) 

Filed on 28 May 1998 (28.05.98) 



(71) Applicant (for all designated States except US): THE GOVER- 

NORS OF THE UNIVERSITY OF ALBERTA [CA/CA]; 
2J2.27 Walter Mackenzie Center, Edmonton, Alberta T6J 
2C2 (CA). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): WEINER, Joel, Hirsch 
[CA/CA]; 41 Fairway Drive, Edmonton, Alberta T6J 2C2 
(CA). TURNER, Raymond, Joseph [CA/CA]; 3707 Centre 
B. Street N.W., Calgary, Alberta T2K 0W1 (CA). 



(74) Agent: CALDWELL, Roseann, B.; Bennett Jones, 4500 
Bankers Hall East, 855 - 2nd Street S.W., Calgary, Alberta 
T2P 4K7 (CA). 



(81) Designated States: AU, CA, JP, US, European patent (AT, BE, 
CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, 
NL, PT, SE). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) Title: COMPOSITIONS AND METHODS FOR PROTEIN SECRETION 



(57) Abstract 



The present invention relates to compositions and methods for secretion of functional proteins in a soluble form by host cells. In 
particular, the invention relates to membrane targeting and translocation proteins, MttA, MttB and MttC and to variants and homologs 
thereof. The membrane targeting and translocation proteins are useful in targeting protein expression to the periplasm of gram negative 
bacteria and to extracellular media of other host cells. Such expression allows secretion of expressed proteins of interest in a functional 
and soluble form, thus facilitating purification and increasing the yield of functional proteins of interest. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


CI 


Cdte d'lvoire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






cu 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federation 






DE 


Germany 


LI 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







WO 99/51753 PCT/CA99/00272 
COMPOSITIONS AND METHODS FOR PROTEIN SECRETION 



FIELD OF THE INVENTION 

The present invention relates to compositions and methods for secretion of functional 
5 proteins in a soluble form by host cells. In particular, the invention relates to proteins 

involved in targeting expression of a protein of interest extracellularly and to the periplasm, 
thus facilitating generation of a functional soluble protein. 

BACKGROUND OF THE INVENTION 

10 Proteins having clinical or industrial value may be obtained using techniques which 

facilitate their synthesis in bacterial or in eukaryotic cell cultures. However, once 
synthesized, there are often problems in recovering these recombinant proteins in substantial 
yields and in a useful form. For example, recombinant proteins expressed in bacteria often 
accumulate in the bacterial cytoplasm as insoluble aggregates known as inclusion bodies 
15 [Marston, (1986) Biochem. J. 240:1-12; Schein (1989) Biotechnology 7:1141-1149]. 

Similarly, recombinant transmembrane proteins which contain both hydrophobic and 
hydrophilic regions are intractable to solubilization. 

While transmembrane recombinant proteins and recombinant proteins which are 
expressed in the cytoplasm may be solubilized by use of strong denaturing solutions {e.g., 
20 urea, guanidium salts, detergents, Triton, SDS detergents, etc.), solubilization efficiency is 
nevertheless variable and there is no general method of solubilization which works for most 
proteins. Additionally, many proteins which are present at high concentrations precipitate out 
of solution when the solubilizing agent is removed. Yet a further drawback to solubilization 
of recombinant proteins is that denaturing chemicals {e.g., guanidium salts and urea) contain 
25 reactive primary amines which swamp those of the protein, thus interfering with the protein's 
reactive amine groups. 

Thus, what is needed is^jnethod for producing soluble proteins. 

SUMMARY OF THE INVENTION 

30 The present invention provides a recombinant polypeptide comprising at least a portion 

of an amino acid sequence selected from the group consisting of SEQ ID NOs:47 and 49. 
SEQ ID NO:7 and variants and homologs thereof, and SEQ ID NO:8 and variants and 
homologs thereof. 

- 1 - 
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This invention further provides an isolated nucleic acid sequence encoding at least a 
portion of an amino acid sequence selected from the group consisting of SEQ ID NOs:47 and 
49, SEQ ID NO:7 and variants and homologs thereof, and SEQ ID NO:8 and variants and 
homologs thereof. In one preferred embodiment, the nucleic acid sequence is contained on a 
recombinant expression vector. In a more preferred embodiment, the expression vector is 

contained within a host cell. 

Also provided by the present invention is a nucleic acid sequence that hybridizes 
under stringent conditions to a nucleic acid sequence encoding an amino acid sequence 
selected from the group consisting of SEQ ID NO:7 and variants and homologs thereof, and 
SEQ ID NO: 8 and variants and homologs thereof. 

The invention additionally provides a method for expressing a nucleotide sequence of 
interest in a host cell to produce a soluble polypeptide sequence, the nucleotide sequence of 
interest when expressed in the absence of an operably linked nucleic acid sequence encoding 
a twin-arginine signal amino acid sequence produces an insoluble polypeptide, comprising: a) 
providing: i) the nucleotide sequence of interest encoding the insoluble polypeptide; ii ) the 
nucleic acid sequence encoding the twin-arginine signal amino acid sequence; and iii) the host 
cell, wherein the host cell comprises at least a portion of an amino acid sequence selected 
from the group consisting of SEQ ID NOs:47 and 49, SEQ ID NO:7 and variants and 
homologs thereof, and SEQ ID NO:8 and variants and homologs thereof; b) operably linking 
the nucleotide sequence of interest to the nucleic acid sequence to produce a linked 
polynucleotide sequence; and c) introducing the linked polynucleotide sequence into the host 
cell under conditions such that the fused polynucleotide sequence is expressed and the soluble 

polypeptide is produced. 

Without intending to limit the location of the insoluble polypeptide, in one preferred 
embodiment, the insoluble polypeptide is comprised in an inclusion body. In another 
preferred embodiment, the insoluble polypeptide comprises a cofactor. In a more preferred 
embodiment, the cofactor is selected from the group consisting of iron-sulfur clusters, 
molybdopterin, polynuclear copper, tryptophan tryptophylquinone, and flavin adenine 
dinucleotide. 

Without limiting the location of the soluble polypetide to any particular location, in 
one preferred embodiment, the soluble polypeptide is comprised in periplasm of the host cell. 
In an alternative preferred embodiment, the host cell is cultured in medium, and the soluble 
polypeptide is contained in the medium. 
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The methods of the invention are not intended to be limited to any particular cell. 
However, in one preferred embodiment, the cell is Escherichia coli. In a more preferred 
embodiment, the Escherichia coli cell is D-43. 

It is not intended that the invention be limited to a particular twin-arginine signal 
amino acid sequence. In a preferred embodiment, the twin-arginine signal amino acid 
sequence is selected from the group consisting of SEQ ID NO:41 and SEQ ID NO:42. 

The invention further provides a method for expressing a nucleotide sequence of 
interest encoding an amino acid sequence of interest in a host cell, comprising: a) providing: 
i) the host cell; ii) the nucleotide sequence of interest; iii) a first nucleic acid sequence 
encoding twin-arginine signal amino acid sequence; and iv) a second nucleic acid sequence 
encoding at least a portion of an amino acid sequence selected from the group consisting of 
SEQ ID NOs:47 and 49, SEQ ID NO:7 and variants and homologs thereof, and SEQ ID 
NO:8 and variants and homologs thereof; b) operably fusing the nucleotide sequence of 
interest to the first nucleic acid sequence to produce a fused polynucleotide sequence; and c) 
introducing the fused polynucleotide sequence and the second nucleic acid sequence into the 
host cell under conditions such that the at least portion of the amino acid sequence selected 
from the group consisting of SEQ ID NOs:47 and 49, SEQ ID NO:7 and variants and 
homologs thereof, and SEQ ID NO:8 and variants and homologs thereof is expressed, and the 
fused polynucleotide sequence is expressed to produce a fused polypeptide sequence 
comprising the twin-arginine signal amino acid sequence and the amino acid sequence of 
interest. 

The location of the expressed amino acid sequence of interest is not intended to be 
limited to any particular location. However, in one preferred embodiment, the expressed 
amino acid sequence of interest is contained in periplasm of the host cell. In a particularly 
preferred embodiment, the expressed amino acid sequence of interest is soluble. Also without 
intending to limit the location of the expressed amino acid sequence of interest, in an 
alternative preferred embodiment, the host cell is cultured in medium, and the expressed 
amino acid sequence of interest is contained in the medium. In a particularly preferred 
embodiment, the expressed amino acid sequence of interest is soluble. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows anaerobic growth of strain a) HB101 and b) D-43 in the presence of 
various electron acceptors: (A) 40 mM nitrate, (□) 35 mM fumarate, (O) 100 mM TMAO or 
(0) 70 mM DMSO. 

Figure 2 shows a Western blot analysis of washed membranes and soluble fractions of 
HB101 and D-43 harboring pDMS160 expressing DmsABC. 

Figure 3 shows A) Nitrate-stained polyacrylamide gel containing periplasmic proteins, 
membrane proteins and cytoplasmic proteins from HB101 and D-43, B) Nitrite-stained 
polyacrylamide gel containing periplasmic proteins from HB101 and D-43, and C) TMAO- 
stained polyacrylamide gel containing periplasmic proteins from HB101 and D-43. 

Figure 4 shows the results of a Western blot analysis of the cellular localization of 
DmsAB in A) HB101 expressing either native DmsABC (pDMS160), DmsABAC 
(pDMSC59X), or FrdABACD, and B) equivalent lanes as in Figure 4A. but with the same 
plasmids in D-43. 

Figure 5 shows a gene map of contig AE00459 noting the positions of the ORFs and 
the clones used in this investigation. 

Figure 6 shows the amino acid sequence (SEQ ID NO:l) of MttA aligned with the 
amino acid sequence of YigT of Haemophilus influenzae (SEQ ID NO:2). 

Figure 7 shows the nucleotide sequence (SEQ ID NO:3) of the mttABC operon which 
contains the nucleotide sequence of the three open reading frames, ORF RF[3] nucleotides 
5640-6439 (SEQ ID NO:4), ORF RF[2] nucleotides 6473-7246 (SEQ ID NO:5). and ORF 
RF[1] nucleotides 7279-8070 (SEQ ID NO:6) which encode the amino acid sequences of 
MttA (SEQ ID NO:l), MttB (SEQ ID NO:7) and MttC (SEQ ID NO:8). respectively. 

Figure 8 shows an alignment of the amino acid sequence of the E. coli MttA sequence 
(SEQ ID NO:l) with amino acid sequences of Hcf 1 06-ZE AM A (SEQ ID NO:9), YBEC- 
ECOLI (SEQ ID NO:10), SYNEC (SEQ ID NO:ll), ORF13-RHOER (SEQ ID NO:12), 
PSEST-ORF57 (SEQ ID NO:13), YY34-MYCLE (SEQ ID NO: 14), HELPY (SEQ ID 
NO:15), HAEIN (SEQ ID NO: 16), BACSU (SEQ ID NO:17), and ORF4-AZOCH (SEQ ID 
NO: 18). 

Figure 9 shows an alignment of the amino acid sequence of the E coli MttB sequence 
(SEQ ID NO:7) with amino acid sequences of YC43-PROPU (SEQ ID NO: 19), YM16- 
MARPO (SEQ ID NO:20), ARATH (SEQ ID NO:21), Ymf 1 6-REC AM (SEQ ID NO:22), 
Y194-SYNY3 (SEQ ID NO:23), YY33-MYCTU (SEQ ID NO:24), HELPY (SEQ ID 
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NO:25), YigU-HAEIN (SEQ ID NO:26), YcbT-BACSU (SEQ ID NO:27), YH25-AZOCH 
(SEQ ID NO:28) and ARCFU (SEQ ID NO:29). 

Figure 10 shows an alignment of the amino acid sequence of the E. coli MttC 
sequence (SEQ ID NO:8) with amino acid sequences of YCFH-ECOLI (SEQ ID NO:30). 
YJJV-ECOLI (SEQ ID NO:31), METTH (SEQ ID NO:32), Y009-MYCPN (SEQ ID NO:33), 
YcfH-Myctu (SEQ ID NO:34), HELPY (SEQ ID NO:35), YCFH-HAEIN (SEQ ID NO:36), 
YABC-BACSU (SEQ ID NO:37), SCHPO (SEQ ID NO:38), CAEEL (SEQ ID NO:39) and 
Y218-HUMAN (SEQ ID NO:40). 

Figure 1 1 shows the nucleotide sequence (SEQ ID NO:45) of the mttABC operon 
which contains the mttAl nucleotide sequence (SEQ ID NO:46) (from nucleic acid number 
642 to nucleic acid number 953) encoding the amino acid sequence of MttAl (SEQ ID 
NO:47), and the mttA2 nucleotide sequence (SEQ ID NO:48) (from nucleic acid number 558 
to nucleic acid number 1472) encoding the amino acid sequence of MttA2 (SEQ ID NO:49). 



DEFINITIONS 

To facilitate understanding of the invention, a number of terms are defined below. 
The term "foreign gene" refers to any nucleic acid (e.g., gene sequence) which is 
introduced into a cell by experimental manipulations and may include gene sequences found 
in that cell so long as the introduced gene contains some modification (e.g., a point mutation, 
the presence of a selectable marker gene, etc.) relative to the naturally-occurring gene. 

The term "gene" refers to a DNA sequence that comprises control and coding 
sequences necessary for the production of RNA or a polypeptide. The polypeptide can be 
encoded by a full length coding sequence or by any portion of the coding sequence. 

The terms "gene of interest" and "nucleotide sequence of interest" refer to any gene or 
nucleotide sequence, respectively, the manipulation of which may be deemed desirable for 
any reason, by one of ordinary skill in the art. Such nucleotide sequences include, but are not 
limited to, coding sequences of structural genes (e.g., reporter genes, selection marker genes, 
oncogenes, drug resistance genes, growth factors, etc.), and of regulatory genes (e.g., activator 
protein 1 (API), activator protein 2 (AP2), Spl, etc.). Additionally, such nucleotide 
sequences include non-coding regulatory elements which do not encode an mRNA or protein 
product, such as for example, a promoter sequence, an enhancer sequence, etc. 

As used herein the term "coding region" when used in reference to a structural gene 
refers to the nucleotide sequences which encode the amino acids found in the nascent 
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polypeptide as a result of translation of an mRNA molecule. The coding region is bounded, 
in eukaryotes, on the 5' side by the nucleotide triplet "ATG" which encodes the initiator 
methionine and on the 3' side by one of the three triplets which specify stop codons {i.e., 
TAA, TAG, TGA). 

5 Transcriptional control signals in eukaryotes comprise "promoter" and "enhancer- 

elements. Promoters and enhancers consist of short arrays of DNA sequences that interact 
specifically with cellular proteins involved in transcription [Maniatis, et ah, Science 236:1237 
(1987)]. Promoter and enhancer elements have been isolated from a variety of eukaryotic 
sources including genes in yeast, insect and mammalian cells and viruses (analogous control 
10 elements, i.e., promoters, are also found in prokaryotes). The selection of a particular 

promoter and enhancer depends on what cell type is to be used to express the protein of 
interest. Some eukaryotic promoters and enhancers have a broad host range while others are 
functional in a limited subset of cell types [for review see Voss, et al. Trends Biochem. Sci., 
11:287 (1986) and Maniatis, et al, Science 236:1237 (1987)]. 
15 The term "wild-type" refers to a gene or gene product which has the characteristics of 

that gene or gene product when isolated from a naturally occurring source. A wild-type gene 
is that which is most frequently observed in a population and is thus arbitrarily designed the 
"normal" or "wild-type" form of the gene. In contrast, the term "modified" or "mutant" refers 
to a gene or gene product which displays modifications in sequence and or functional 
20 properties (i.e., altered characteristics) when compared to the wild-type gene or gene product. 
It is noted that naturally-occurring mutants can be isolated; these are identified by the fact 
that they have altered characteristics when compared to the wild-type gene or gene product. 

The term "expression vector" as used herein refers to a recombinant DNA molecule 
containing a desired coding sequence and appropriate nucleic acid sequences necessary for the 
25 expression of the operably linked coding sequence in a particular host cell. Nucleic acid 

sequences necessary for expression in prokaryotes include a promoter, optionally an operator 
sequence, a ribosome binding site and possibly other sequences. Eukaryotic cells are known 
to utilize promoters, enhancers, and termination and polyadenylation signals. 

The terms "targeting vector" or "targeting construct" refer to oligonucleotide sequences 
30 comprising a gene of interest flanked on either side by a recognition sequence which is 

capable of homologous recombination of the DNA sequence located between the flanking 
recognition sequences into the chromosomes of the target cell or recipient cell. Typically, the 
targeting vector will contain 10 to 15 kb of DNA homologous to the gene to be recombined; 

- 6 - 
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this 10 to 15 kb of DNA is generally divided more or less equally on each side of the 
selectable marker gene. The targeting vector may contain more than one selectable maker 
gene. When more than one selectable marker gene is employed, the targeting vector 
preferably contains a positive selectable marker {e.g., the neo gene) and a negative selectable 
marker {e.g., the Herpes simplex virus tk (HSV-f*) gene). The presence of the positive 
selectable marker permits the selection of recipient cells containing an integrated copy of the 
targeting vector whether this integration occurred at the target site or at a random site. The 
presence of the negative selectable marker permits the identification of recipient cells 
containing the targeting vector at the targeted site {i.e., which has integrated by virtue of 
homologous recombination into the target site); cells which survive when grown in medium 
which selects against the expression of the negative selectable marker do not contain a copy 
of the negative selectable marker. Integration of a replacement-type vector results in the 
insertion of a selectable marker into the target gene. Replacement-type targeting vectors may 
be employed to disrupt a gene resulting in the generation of a null allele {i.e., an allele 
incapable of expressing a functional protein; null alleles may be generated by deleting a 
portion of the coding region, deleting the entire gene, introducing an insertion and/or a 
frameshift mutation, etc.) or may be used to introduce a modification {e.g., one or more point 

mutations) into a gene. 

The terms "in operable combination", "in operable order" and "operably linked" as 
used herein refer to the linkage of nucleic acid sequences in such a manner that a nucleic acid 
molecule capable of directing the transcription of a given gene and/or the synthesis of a 
desired protein molecule is produced. The term also refers to the linkage of amino acid 
sequences in such a manner so that a functional protein is produced. 

As used herein, the terms "vector" and "vehicle" are used interchangeably in reference 
to nucleic acid molecules that transfer DNA segment(s) from one cell to another. 

The term "recombinant DNA molecule" as used herein refers to a DNA molecule 
which is comprised of segments of DNA joined together by means of molecular biological 
techniques. 

The term "recombinant protein" or "recombinant polypeptide" as used herein refers to 
a protein molecule which is expressed using a recombinant DNA molecule. 

The term "transfection" as used herein refers to the introduction of a transgene into a 
cell. The term "transgene" as used herein refers to any nucleic acid sequence which is 
introduced into the genome of a cell by experimental manipulations. A transgene may be an 
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"endogenous DNA sequence," or a "heterologous DNA sequence" (i.e., "foreign DNA"). The 
term "endogenous DNA sequence" refers to a nucleotide sequence which is naturally found in 
the cell into which it is introduced so long as it does not contain some modification (e.g., a 
point mutation, the presence of a selectable marker gene, etc.) relative to the naturally- 
5 occurring sequence. The term "heterologous DNA sequence" refers to a nucleotide sequence 
which is not endogenous to the cell into which it is introduced. Heterologous DNA includes 
a nucleotide sequence which is ligated to, or is manipulated to become ligated to, a nucleic 
acid sequence to which it is not ligated in nature, or to which it is ligated at a different 
location in nature. Heterologous DNA also includes a nucleotide sequence which is naturally 
10 found in the cell into which it is introduced and which contains some modification relative to 
the naturally-occurring sequence. Generally, although not necessarily, heterologous DNA 
encodes RNA and proteins that are not normally produced by the cell into which it is 
introduced. Examples of heterologous DNA include reporter genes, transcriptional and 
translations! regulatory sequences, DNA sequences which encode selectable marker proteins 
1 5 (e.g. , proteins which confer drug resistance), etc. Yet another example of a heterologous 
DNA includes a nucleotide sequence which encodes a ribozyme which is found in the cell 
into which it is introduced, and which is ligated to a promoter sequence to which it is not 

naturally ligated in that cell. 

Transfection may be accomplished by a variety of means known to the art including 
20 calcium phosphate-DNA co-precipitation, DEAE-dextran-mediated transfection, polybrene- 

mediated transfection, electroporation, microinjection, liposome fusion, lipofection, protoplast 
fusion, retroviral infection, biolistics (i.e., particle bombardment) and the like. 

The term "stable transfection" or "stably transfected" refers to the introduction and 
integration of a transgene into the genome of the transfected cell. The term "stable 
25 transfectant" refers to a cell which has stably integrated one or more transgenes into the 
genomic DNA. 

As used herein the term "portion" when in reference to a gene refers to fragments of 
that gene. The fragments may range in size from 5 nucleotide residues to the entire 
nucleotide sequence minus one nucleic acid residue. Thus, "an oligonucleotide comprising at 
30 least a portion of a gene" may comprise small fragments of the gene or nearly the entire gene. 

The term "portion" when used in reference to a protein (as in a "portion of a given 
protein") refers to fragments of that protein. The fragments may range in size from four 
amino acid residues to the entire amino acid sequence minus one amino acid. 

- 8 - 
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The term "isolated" when used in relation to a nucleic acid, as in "an isolated 
oligonucleotide" refers to a nucleic acid sequence that is identified and separated from at least 
one contaminant nucleic acid with which it is ordinarily associated in its natural source. 
Isolated nucleic acid is nucleic acid present in a form or setting that is different from that in 
which it is found in nature. In contrast, non-isolated nucleic acids are nucleic acids such as 
DNA and RNA which are found in the state they exist in nature. For example, a given DNA 
sequence (e.g., a gene) is found on the host cell chromosome in proximity to neighboring 
genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are 
found in the cell as a mixture with numerous other mRNAs which encode a multitude of 
proteins. However, isolated nucleic acid sequences encoding MttAl, MttA2, MttB or MttC 
polypeptides include, by way of example, such nucleic acid sequences in cells ordinarily 
expressing MttAl, MttA2, MttB or MttC polypeptides, respectively, where the nucleic acid 
sequences are in a chromosomal or extrachromosomal location different from that of natural 
cells, or are otherwise flanked by a different nucleic acid sequence than that found in nature. 
The isolated nucleic acid or oligonucleotide may be present in single-stranded or double- 
stranded form. When an isolated nucleic acid or oligonucleotide is to be utilized to express a 
protein, the oligonucleotide will contain at a minimum the sense or coding strand (i.e., the 
oligonucleotide may be single-stranded). Alternatively, it may contain both the sense and 
anti-sense strands (i.e., the oligonucleotide may be double- stranded). 

As used herein, the term "purified" or "to purify" refers to the removal of undesired 
components from a sample. For example, where recombinant MttAl, MttA2, MttB or MttC 
polypeptides are expressed in bacterial host cells, the MttAl, MttA2, MttB or MttC 
polypeptides are purified by the removal of host cell proteins thereby increasing the percent 
of recombinant MttAl, MttA2, MttB or MttC polypeptides in the sample. 

As used herein, the term "substantially purified" refers to molecules, either nucleic or 
amino acid sequences, that are removed from their natural environment, isolated or separated, 
and are at least 60% free, preferably 75% free, and more preferably 90% free from other 
components with which they are naturally associated. An "isolated polynucleotide" is 
therefore a substantially purified polynucleotide. 

The term "recombinant DNA molecule" as used herein refers to a DNA molecule 
which is comprised of segments of DNA joined together by means of molecular biological 
techniques. 
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The term "recombinant protein" or "recombinant polypeptide" as used herein refers to 
a protein molecule which is expressed using a recombinant DNA molecule. 

The term "homology" when used in relation to nucleic acids refers to a degree of 
complementarity. There may be partial homology or complete homology (i.e., identity). A 
partially complementary sequence is one that at least partially inhibits a completely 
complementary sequence from hybridizing to a target nucleic acid is referred to using the 
functional term "substantially homologous." The inhibition of hybridization of the completely 
complementary sequence to the target sequence may be examined using a hybridization assay 
(Southern or Northern blot, solution hybridization and the like) under conditions of low 
stringency. A substantially homologous sequence or probe (i.e., an oligonucleotide which is 
capable of hybridizing to another oligonucleotide of interest) will compete for and inhibit the 
binding (i.e., the hybridization) of a completely homologous sequence to a target under 
conditions of low stringency. This is not to say that conditions of low stringency are such 
that non-specific binding is permitted; low stringency conditions require that the binding of 
two sequences to one another be a specific (i.e., selective) interaction. The absence of non- 
specific binding may be tested by the use of a second target which lacks even a partial degree 
of complementarity (e.g., less than about 30% identity); in the absence of non-specific 
binding the probe will not hybridize to the second non-complementary target. 

Low stringency conditions when used in reference to nucleic acid hybridization 
comprise conditions equivalent to binding or hybridization at 42°C in a solution consisting of 
5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 .H 2 0 and 1.85 g/1 EDTA. pH adjusted to 7.4 with 
NaOH), 0.1% SDS, 5X Denhardt's reagent [50X Denhardt's contains per 500 ml: 5 g Ficoll 
(Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 ug/ml denatured salmon 
sperm DNA followed by washing in a solution comprising 5X SSPE, 0.1% SDS at 42°C 
when a probe of about 500 nucleotides in length is employed. 

High stringency conditions when used in reference to nucleic acid hybridization 
comprise conditions equivalent to binding or hybridization at 42°C in a solution consisting of 
5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 -H 2 0 and 1.85 g/1 EDTA, pH adjusted to 7.4 with 
NaOH), 0.5% SDS, 5X Denhardt's reagent and 100 ug/ml denatured salmon sperm DNA 
followed by washing in a solution comprising 0.1X SSPE, 1.0% SDS at 42°C when a probe 
of about 500 nucleotides in length is employed. 

When used in reference to nucleic acid hybridization the art knows well that numerous 
equivalent conditions may be employed to comprise either low or high stringency conditions; 
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factors such as the length and nature (DNA, RNA, base composition) of the probe and nature 
of the target (DNA, RNA, base composition, present in solution or immobilized, etc.) and the 
concentration of the salts and other components (e.g., the presence or absence of formamide, 
dextran sulfate, polyethylene glycol) are considered and the hybridization solution may be 
varied to generate conditions of either low or high stringency hybridization different from, but 
equivalent to, the above listed conditions. 

As used herein, the terms "nucleic acid molecule encoding," "DNA sequence 
encoding," and "DNA encoding" refer to the order or sequence of deoxyribonucleotides along 
a strand & of deoxyribonucleic acid. The order of these deoxyribonucleotides determines the 
order of ribonucleotides along the mRNA chain, and also determines the order of amino acids 
along the polypeptide (protein) chain. The DNA sequence thus codes for the RNA sequence 

and for the amino acid sequence. 

"Nucleic acid sequence" and "nucleotide sequence" as used interchangeably herein 
refer to an oligonucleotide or polynucleotide, and fragments or portions thereof, and to DNA 
or RNA of genomic or synthetic origin which may be single- or double-stranded, and 
represent the sense or antisense strand. 

"Amino acid sequence" and "polypeptide sequence" are used interchangeably herein to 

refer to a sequence of amino acids. 

The term "antisense sequence" as used herein refers to a deoxyribonucleotide sequence 
whose sequence of deoxyribonucleotide residues is in reverse 5' to 3' orientation in relation 
to the sequence of deoxyribonucleotide residues in a sense strand of a DNA duplex. A "sense 
strand" of a DNA duplex refers to a strand in a DNA duplex which is transcribed by a cell in 
its natural state into a "sense mRNA." Sense mRNA generally is ultimately translated into a 
polypeptide. Thus an "antisense" sequence is a sequence having the same sequence as the 
non-coding strand in a DNA duplex. The term "antisense RNA" refers to a ribonucleotide 
sequence whose sequence is complementary to an "antisense" sequence. Alternatively, the 
term "antisense RNA" is used in reference to RNA sequences which are complementary to a 
specific RNA sequence (e.g., mRNA). Antisense RNA may be produced by any method, 
including synthesis by splicing the gene(s) of interest in a reverse orientation to a viral 
promoter which permits the synthesis of a coding strand. Once introduced into a cell, this 
transcribed strand combines with natural mRNA produced by the cell to form duplexes. 
These duplexes then block either the further transcription of the mRNA or its translation. In 
this manner, mutant phenotypes may be generated. The term "antisense strand" is used in 
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reference to a nucleic acid strand that is complementary to the "sense" strand. The 
designation (-) (i.e., "negative") is sometimes used in reference to the antisense strand, with 
the designation (+) sometimes used in reference to the sense (i.e., "positive") strand. 

The term "biologically active" when made in reference to MttAl, MttA2, MttB or 
MttC refers to a MttAl, MttA2, MttB or MttC molecule, respectively, having biochemical 
functions of a naturally occurring MttAl, MttA2, MttB or MttC. Biological activity of 
MttAl, MttA2. MttB or MttC is determined, for example, by restoration of wild-type 
targeting of proteins which contain twin-arginine signal amino acid sequence to cell 
membranes and/or translocation of such proteins to the periplasm in cells lacking MttA, MttB 
or MttC activity (i.e., MttAl, MttA2, MttB or MttC null cells). Cells lacking MttAl. MttA2, 
MttB or MttC activity may be produced using methods well known in the art (e.g., point 
mutation and frame-shift mutation) [Sambasivarao et al (1991) J. Bacteriol. 5935-5943; Jasin 
et al (1984) J. Bacteriol. 159:783-786]. Complementation is achieved by transfecting cells 
which lack MttAl, MttA2, MttB or MttC activity with an expression vector which expresses 
MttAl, MttA2, MttB or MttC, a homolog thereof, or a portion thereof. Details concerning 
complementation of cells which contain a point mutation in MttAl, MttA2 is provided in 
Example 6 herein. 

As used herein "soluble" when in reference to a protein produced by recombinant 
DNA technology in a host cell is a protein which exists in solution; if the protein contains a 
twin-arginine signal amino acid sequence the soluble protein is exported to the periplasmic 
space in gram negative bacterial hosts and is secreted into the culture medium by eukaryotic 
cells capable of secretion or by bacterial host possessing the appropriate genes (i.e., the kil 
gene). Thus, a soluble protein is a protein which is not found in an inclusion body inside the 
host cell. Alternatively, a soluble protein is a protein which is not found integrated in cellular 
membranes. In contrast, an insoluble protein is one which exists in denatured form inside 
cytoplasmic granules (called an inclusion body) in the host cell. Alternatively, an insoluble 
protein is one which is present in cell membranes, including but not limited to, cytoplasmic 
membranes, mitochondrial membranes, chloroplast membranes, endoplasmic reticulum 
membranes, etc. 

A distinction is drawn between a soluble protein (i.e., a protein which when expressed 
in a host cell is produced in a soluble form) and a "solubilized" protein. An insoluble 
recombinant protein found inside an inclusion body or found integrated in a cell membrane 
may be solubilized (i.e., rendered into a soluble form) by treating purified inclusion bodies or 
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cell membranes with denaturants such as guanidine hydrochloride, urea or sodium dodecyl 
sulfate (SDS). These denaturants must then be removed from the solubilized protein 
preparation to allow the recovered protein to renature (refold). Not all proteins will refold 
into an active conformation after solubilization in a denaturant and removal of the denaturant. 
Many proteins precipitate upon removal of the denaturant. SDS may be used to solubilize 
inclusion bodies and cell membranes and will maintain the proteins in solution at low 
concentration. However, dialysis will not always remove all of the SDS (SDS can form 
micelles which do not dialyze out); therefore, SDS-solubilized inclusion body protein and 
SDS-solubilized cell membrane protein is soluble but not refolded. 

A distinction is also drawn between proteins which are soluble ( i.e., dissolved) in a 
solution devoid of significant amounts of ionic detergents {e.g., SDS) or denaturants (e.g., 
urea, guanidine hydrochloride) and proteins which exist as a suspension of insoluble protein 
molecules dispersed within the solution. A soluble protein will not be removed from a 
solution containing the protein by centrifugation using conditions sufficient to remove cells 
present in a liquid medium (e.g., centrifugation at 5,000 x g for 4-5 minutes). 

DESCRIPTION OF THE INVENTION 

The present invention exploits the identification of proteins involved in a Sec- 
independent protein translocation pathway which are necessary for the translocation of 
proteins which contain twin-arginine signal amino acid sequences to the periplasm of gram 
negative bacteria, and into the extracellular media of cells which do not contain a periplasm 
(e.g., gram positive bacteria, eukaryotic cells, etc.), as well as for targeting such proteins to 
cell membranes. The proteins of the invention are exemplified by the Membrane Targeting 
and Translocation proteins MttAl (103 amino acids), MttA2 (161 amino acids), MttB (258 
amino acids) and MttC (264 amino acids) of E. coli which are encoded by the mttABC 
operon. The invention further exploits the presence of a large number of proteins which are 
widely distributed in organisms extending from archaebacteria to higher eukaryotes. 

The well characterized Sec-dependent export system translocates an unfolded string of 
amino acids to the periplasm and folding follows as a subsequent step in the periplasm and 
mediated by chaperones and disulfide rearrangement. In contrast to the Sec-dependent export 
pathway, the proteins of the invention translocate fully-folded as well as cofactor-containing 
proteins from the cytoplasm into the bacterial periplasm and are capable of translocating such 
proteins into extracellular medium. Such translocation offers a unique advantage over current 
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methodologies for protein purification. Because the composition of culture medium can be 
manipulated, and because the periplasm contains only about 3% of the proteins of gram 
negative bacteria, expressed proteins which are translocated into the extracellular medium or 
into the periplasm are more likely to be expressed as functional soluble proteins than if they 
were translocated to cellular membranes or to the cytoplasm. Furthermore, translocation to 
the periplasm or to the extracellular medium following protein expression in the cytoplasm 
allows the expressed protein to be correctly folded by cytoplasmic enzymes prior to its 
translocation, thus allowing retention of the expressed protein's biological activity. 

The mttABC operon disclosed herein is also useful in screening compounds for 
antibiotic activity by identifying those compounds which inhibit translocation of proteins 
containing twin-arginine signal amino acid sequences in bacteria. For example, DMSO 
reductase has been found to be essential for the pathogenesis of Salmonella [Bowe and 
Heffron (1994) Methods in Enzymology 236:509-526]. Thus, compounds which inhibit 
targeting of DMSO reductase to Salmonella could result in conversion of a virulent bacterial 
strain to an avirulent nonpathogenic variant. 

The invention is further described under (A) mttA, mttB, and mttC nucleotide 
sequences, (B) MttA, MttB, and MttC polypeptides, and (C) Methods for expressing 
polypeptides to produce soluble proteins. 

A. mttA, mttB, and mttC nucleotide sequences 

The present invention discloses the nucleic acid sequence of the mttAl (SEQ ID 
NO:46). mttAl (SEQ ID NO:48), mttB (SEQ ID NO:5) and mttC (SEQ ID NO:6) genes 
which form part of the mttABC operon (SEQ ID NO:45) shown in Figure 11. Data presented 
herein demonstrates that the MttA2 polypeptide encoded by mttA2 functions in targeting 
proteins which contain twin-arginine signal amino acid sequences to cell membranes, and in 
translocating such proteins to the periplasm of gram negative bacteria and to the extracellular 
medium of cells which do not contain a periplasm {e.g., gram positive bacteria and eukaryotic 
cells). Data presented herein further shows that the MttB and MttC polypeptides which are 
encoded by mttB and mttC, respectively, also serve the same functions as MttA2. This 
conclusion is based on the inventors' finding that mttAl, mttA2, mttB and mttC form an 
operon which is expressed as a single polycistronic mRNA. 

The function of MttB and MttC may be demonstrated by in vivo homologous 
recombination of chromosomal mttB and mttC by using knockouts in the mttBC operon by 
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utilizing insertion of mini-Mudll as previously described [Taylor et al. (1994) J. Bacteriol. 
176:2740-2742]. Alternatively, the function of MttB and MttC may also be demonstrated as 
previously described [Sambasivarao et al (1991) J. Bacteriol. 5935-5943; Jasin et al (1984) J. 
Bacteriol. 159:783-786]. Briefly, the mttABC operon (Figure 11) is cloned into pTZ18R and 
pBR322 vectors. In pBR322, the Hindlll site in mttB is unique. The pBR322 containing 
mttB is then modified by insertion of a kanamycin gene cartridge at this unique site, while the 
unique Nrul fragment contained in mttC are replaced by a kanamycin cartridge. The modified 
plasmids are then be homologously recombined with chromosomal mttB and mttC in E. coli 
cells which contain either a recBC mutation or a recD mutation. The resulting recombinant 
are transferred by PI transduction to suitable genetic backgrounds for investigation of the 
localization of protein expression. The localization (e.g., cytoplasm, periplasm, cell 
membranes, extracellular medium) of expression of proteins which contain twin-arginine 
signal amino acid sequences is compared using methods disclosed herein (e.g., functional 
enzyme activity and Western blotting) between homologously recombined cells and control 
cells which had not been homologously recombined. Localization of expressed proteins 
which contain twin-arginine signal amino acid sequences in extracellular medium or in the 
periplasm of homologously recombined cells as compared to localization of expression in 
other than the extracellular medium and the periplasm (e.g., in the cytoplasm, in the cell 
membrane, etc.) of control cells demonstrates that the wild-type MttB or MttC protein whose 
function had been modified by homologous recombination functions in translocation of the 
twin argining containing proteins to the extracellular medium or to the periplasm. 

The present invention contemplates any nucleic acid sequence which encodes one or 
more of MttAl, MttA2, MttB and MttC polypeptide sequences or variants or homologs 
thereof. These nucleic acid sequences are used to make recombinant molecules which express 
the MttAl, MttA2, MttB and MttC polypeptides. For example, one of ordinary skill in the 
art would recognize that the redundancy of the genetic code permits an enormous number of 
nucleic acid sequences which encode the MttAl, MttA2, MttB and MttC polypeptides. Thus, 
codons which are different from those shown in Figure 7 may be used to increase the rate of 
expression of the nucleotide sequence in a particular prokaryotic or eukaryotic expression host 
which has a preference for particular codons. Additionally, alternative codons may also be 
used in eukaryotic expression hosts to generate splice variants of recombinant RNA transcripts 
which have more desirable properties (e.g., longer or shorter half-life) than transcripts 
generated using the sequence depicted in Figure 7. In addition, different codons may also be 
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desirable for the purpose of altering restriction enzyme sites or, in eukaryotic expression 
hosts, of altering glycosylation patterns in translated polypeptides. 

The nucleic acid sequences of the invention may also be used for in vivo homologous 
recombination with chromosomal nucleic acid sequences. Homologous recombination may be 
5 desirable to, for example, delete at least a portion of at least one of chromosomal mttAL 
mttA2, mttB and mttC nucleic acid sequences, or to introduce a mutation in these 
chromosomal nucleic acid sequence as described below. 

Variants of the nucleotide sequences which encode MttAl, MttA2, MttB and MttC and 
which are shown in Figure 7 and Figure 1 1 are also included within the scope of this 
10 invention. These variants include, but are not limited to, nucleotide sequences having 
deletions, insertions or substitutions of different nucleotides or nucleotide analogs. 

This invention is not limited to the mttAl, mttA2, mttB and mttC sequences (SEQ ID 
NOs:46, 48, 5 and 6, respectively) but specifically includes nucleic acid homologs which are 
capable of hybridizing to the nucleotide sequence encoding MttAl, MttA2, MttB and MttC 
15 (Figures 11 and 7), and to portions, variants and homologs thereof. Those skilled in the art 

know that different hybridization stringencies may be desirable. For example, whereas higher 
stringencies may be preferred to reduce or eliminate non-specific binding between the 
nucleotide sequences of Figure 7 and other nucleic acid sequences, lower stringencies may be 
preferred to detect a larger number of nucleic acid sequences having different homologies to 
20 the nucleotide sequence of Figure 7. 

Portions of the nucleotide sequence encoding MttAl, mttA2, MttB and MttC of 
Figures 1 1 and 7 are also specifically contemplated to be within the scope of this invention. 
It is preferred that the portions have a length equal to or greater than 1 0 nucleotides and show 
greater than 50% homology to nucleotide sequences encoding MttAl, mttA2, MttB and MttC 
25 of Figures 1 1 and 7. 

The present invention further contemplates antisense molecules comprising the nucleic 
acid sequence complementary to at least a portion of the polynucleotide sequences encoding 
MttAl, mttA2, MttB and MttC (Figures 11 and 7). 

The scope of this invention further encompasses nucleotide sequences containing the 
30 nucleotide sequence of Figures 1 1 and 7, portions, variants, and homologs thereof, ligated to 
one or more heterologous sequences as part of a fusion gene. Such fusion genes may be 
desirable, for example, to detect expression of sequences which form part of the fusion gene. 
Examples of a heterologous sequence include the reporter sequence encoding the enzyme 
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(3-galactosidase or the enzyme luciferase. Fusion genes may also be desirable to facilitate 
purification of the expressed protein. For example, the heterologous sequence of protein A 
allows purification of the fusion protein on immobilized immunoglobulin. Other affinity traps 
are well known in the art and can be utilized to advantage in purifying the expressed fusion 
5 protein. For example, pGEX vectors (Promega, Madison WI) may be used to express the 

MttAl, MttA2, MttB and MttC polypeptides as a fusion protein with glutathione S-transferase 
(GST). In general, such fusion proteins are soluble and can easily be purified from lysed 
cells by adsorption to glutathione-agarose beads followed by elution in the presence of free 
glutathione. Proteins made in such systems are designed to include heparin, thrombin or 

1 0 factor XA protease cleavage sites so that the cloned polypeptide of interest can be released 
from the GST moiety at will. 

The nucleotide sequences which encode MttAl, MttA2, MttB and MttC (Figures 11 
and 7), portions, variants, and homologs thereof can be synthesized by synthetic chemistry 
techniques which are commercially available and well known in the art. The nucleotide 

1 5 . sequence of synthesized sequences may be confirmed using commercially available kits as 
well as from methods well known in the art which utilize enzymes such as the Klenow 
fragment of DNA polymerase I, Sequenase®, Taq DNA polymerase, or thermostable T7 
polymerase. Capillary electrophoresis may also be used to analyze the size and confirm the 
nucleotide sequence of the products of nucleic acid synthesis. Synthesized sequences may 

20 also be amplified using the polymerase chain reaction (PCR) as described by Mullis [U.S. 

Patent No. 4,683,195] and Mullis et al [U.S. Patent No. 4,683,202], the ligase chain reaction 
[LCR; sometimes referred to as "Ligase Amplification Reaction" (LAR)] described by Barany, 
Proc. Natl. Acad. Sci., 88:189 (1991); Barany, PCR Methods and Applic, 1:5 (1991); and 
Wu and Wallace, Genomics 4:560 (1989). 

25 It is readily appreciated by those in the art that the mttAl, mttA2, mttB and mttC 

nucleotide sequences of the present invention may be used in a variety of ways. For 
example, fragments of the sequence of at least about 10 bp, more usually at least about 15 bp, 
and up to and including the entire {i.e., full-length) sequence can be used as probes for the 
detection and isolation of complementary genomic DNA sequences from any cell. Genomic 

30 sequences are isolated by screening a genomic library with all or a portion of the nucleotide 
sequences which encode MttAl, MttA2, MttB and MttC (Figures 11 and 7). In addition to 
screening genomic libraries, the mttAl, mttA2, mttB and mttC nucleotide sequences can also 
be used to screen cDNA libraries made using RNA. 
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The mttAl, mttA2, mttB and mttC nucleotide sequences of the invention are also useful 
in directing the synthesis of MttAl, MttA2, MttB, and MttC, respectively. The MttAl, 
MttA2, MttB, and MttC polypeptides find use in producing antibodies which may be used in, 
for example, detecting cells which express MttAl, MttA2, MttB and MttC. These cells may 
5 additionally find use in directing expression of recombinant proteins to cellular membranes or 
to the periplasm, extracellular medium. Alternatively, cells containing at least one of MttAl, 
MttA2, MttB and MttC may be used to direct expression of recombinant proteins which are 
engineered to contain twin-arginine signal amino acid sequences, or of wild-type proteins 
which contain twin-arginine signal amino acid sequences, to the periplasm or extracellularly 
1 0 (as described below), thus reducing the likelihood of formation of insoluble proteins. 

B. MttA, MttB, and MttC polypeptides 

This invention discloses the amino acid sequence of MttAl (SEQ ID NO:47), and 
MttA2 (SEQ ID NO:49) which are encoded by the mttAl and mttA2 genes, respectively. 

15 Data presented herein demonstrates that the protein MttA2 targets twin arginine containing 
proteins (i.e., proteins which contain twin-arginine signal amino acid sequences), as 
exemplified by the proteins dimethylsulfoxide (DMSO) reductase (DmsABC) to the cell 
membrane (Examples 2 and 5). The function of MttA2 in membrane targeting of twin 
arginine containing proteins was demonstrated by isolating a pleiotropic-negative mutant in 

20 mttA2 which prevents the correct membrane targeting of Escherichia coli dimethylsulfoxide 
reductase and results in accumulation of DmsA in the cytoplasm. DmsABC is an integral 
membrane molybdoenzyme which normally faces the cytoplasm and the DmsA subunit has a 
twin-arginine signal amino acid sequence. The mutation in mttA2 changed proline 25 to 
leucine in the encoded MttA2, and was complemented by a DNA fragment encoding the 

25 mttA2 gene. 

Data presented herein further demonstrates that MttA2 also functions in selectively 
translocating twin arginine containing proteins, as exemplified by nitrate reductase (NapA) 
and trimethylamine N-oxide reductase (TorA), to the periplasm (Example 4). The mutation 
in the mttA2 gene resulted in accumulation of the periplasmic proteins NapA and TorA in the 
30 cytoplasm and cell membranes. In contrast, proteins with a sec-dependent leader, as 

exemplified by nitrite reductase (NrfA), or which contain a twin-arginine signal amino acid 
sequence and which assemble spontaneously in the membrane, as exemplified by 
trimethylamine N-oxide (TMAO), were not affected by this mutation (Examples 2 and 4). 
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The isolation of mutant D-43 which contained a mutant mttA2 gene was unexpected. 
The assembly of multisubunit redox membrane proteins in bacteria and eukaryotic organelles 
has been assumed to be a spontaneous process mediated by protein-protein interactions 
between the integral anchor subunit(s) and the extrinsic subunit(s) [Latour and Weiner (1987) 
5 J. Gen. Microbiol. 133:597-607; Lemire et al (1983) J. Bacteriol. 155:391-397]. It has 
previously been shown that the extrinsic subunits of fumarate reductase, FrdAB, can be 
reconstituted to form the holoenzyme with the anchor subunits, FrdCD, in vitro without any 
additional proteins [Lemire et al (1983) J. Bacteriol. 155:391-397]. Because the architecture 
of DMSO reductase is similar to that of fumarate reductase, it seemed likely that this protein 

1 0 assembled in a similar manner. However, data presented herein demonstrates that this was 
not the case. Thus, the isolation of mutant D-43 was unexpected and it suggests that the 
assembly of DmsABC needs auxiliary proteins for optimal efficiency. Alternatively, the 
assembly of DmsABC may be an evolutionary vestige related to the soluble periplasmic 
DMSO reductase found in several organisms [McEwan (1994) Antonie van Leeuwenhoek 

15 66:151-164; McEwan et al (1991) Biochem. J. 274:305-307], 

Without limiting the invention to a particular mechanism, MttA2 is predicted to be a 
membrane protein with two transmembrane segments and a long periplasmic a-helix. Proline 
25 is located after the second transmembrane helix and immediately preceding the long 
periplasmic a-helix suggesting the essential nature of this region of MttA2. Interestingly, the 

20 smallest complementing DNA fragment, pGS20, only encoded the amino terminal two thirds 
of MttA2. This suggests that the carboxy terminal globular domain is not necessary or can be 
substituted by some other activity. This conclusion is further supported by the observation 
that the carboxy terminal third of MttA2 is also the least conserved region of MttA2. While 
the amino terminal of MttA2 is homologous to YigT of Settles et al (1997) Science 

25 278:1467-1470, the YigT sequence was not correct throughout its length. Data presented 

herein shows that proteins which were homologous to MttAl and MttA2 were identified by 
BLAST searches in a wide variety of archaebacteria, eubacteria, cyanobacteria and plants, 
suggesting that the sec-independent translocation system of which MttAl and MttA2 are 
members is very widely distributed in nature. 

30 The invention further discloses the amino acid sequence of MttB (SEQ ID NO:7) and 

MttC (SEQ ID NO: 8). Without limiting the invention to any particular mechanism, MttB is 
an integral membrane protein with six transmembrane segments and MttC is a membrane 
protein with one or two transmembrane segments and a large cytoplasmic domain. Proteins 
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homologous to MttB were identified by BLAST searches in a wide variety of archaebacteria, 
eubacteria, cyanobacteria and plants, suggesting that the protein translocation system of which 
MttB is a member is very widely distributed in nature. The MttC protein was even more 
widely dispersed with homologous proteins identified in archaebacteria, mycoplasma, 
5 eubacteria, cyanobacteria, yeast, plants, C. elegans and humans. In all cases the related 
proteins were of previously unknown function. 

Without limiting the invention to any particular mechanism, the predicted topology of 
the MttABC proteins suggests that the large cytoplasmic domain of MttC serves a receptor 
function for twin arginine containing proteins, with the integral MttB protein serving as the 

10 pore for protein transport. Based on the observation that the MttA2 can form a long a-helix, 
this protein is predicted to play a role in gating the pore. 

The present invention specifically contemplates variants and homologs of the amino 
acid sequences of MttAl, MttA2, MttB and MttC. A "variant" of MttAl, MttA2, MttB and 
MttC is defined as an amino acid sequence which differs by one or more amino acids from 

15 the amino acid sequence of MttAl (SEQ ID NO:47), MttA2 (SEQ ID NO:49), MttB (SEQ ID 
NO:7) and MttC (SEQ ID NO:8), respectively. The variant may have "conservative" 
changes, wherein a substituted amino acid has similar structural or chemical properties, e.g., 
replacement of leucine with isoleucine. More rarely, a variant may have "nonconservative" 
changes, e.g., replacement of a glycine with a tryptophan. Similar minor variations may also 

20 include amino acid deletions or insertions {i.e., additions), or both. Guidance in determining 
which and how many amino acid residues may be substituted, inserted or deleted without 
abolishing biological or immunological activity may be found using computer programs well 
known in the art, for example, DNAStar software. 

For example, MttAl, MttA2, MttB and MttC variants included within the scope of this 

25 invention include MttAl, MttA2, MttB and MttC polypeptide sequences containing deletions, 
insertion or substitutions of amino acid residues which result in a polypeptide that is 
functionally equivalent to the MttAl, MttA2, MttB and MttC polypeptide sequences of Figure 
1 1 and Figure 7. For example, amino acids may be substituted for other amino acids having 
similar characteristics of polarity, charge, solubility, hydrophobicity, hydrophilicity and/or 

30 amphipathic nature. Alternatively, substitution of amino acids with other amino acids having 
one or more different characteristic may be desirable for the purpose of producing a 
polypeptide which is secreted from the cell in order to, for example, simplify purification of 
the polypeptide. 
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The present invention also specifically contemplates homologs of the amino acid 
sequences of MttAl, MttA2, MttB and MttC. An oligonucleotide sequence which is a 
"homolog" of MttAl (SEQ ID NO:47), MttA2 (SEQ ID NO:49), MttB (SEQ ID NO:7) and 
MttC (SEQ ID NO: 8) is defined herein as an oligonucleotide sequence which exhibits greater 
5 than or equal to 50% identity to the sequence of MttAl (SEQ ID NO:47) ? MttA2 (SEQ ID 
NO:49), MttB (SEQ ID NO:7) and MttC (SEQ ID NO:8), respectively, when sequences 
having a length of 20 amino acids or larger are compared. Alternatively, a homolog of 
MttAl (SEQ ID NO:47), MttA2 (SEQ ID NO:49), MttB (SEQ ID NO:7) and MttC (SEQ ID 
NO:8) is defined as an oligonucleotide sequence which encodes a biologically active MttAl, 

10 MttA2, MttB and MttC amino acid sequence, respectively. 

The MttAl, MttA2, MttB and MttC polypeptide sequence of Figures 11 and 7 and 
their functional variants and homologs may be made using chemical synthesis. For example, 
peptide synthesis of the MttAl, MttA2, MttB and MttC polypeptides, in whole or in part, can 
be performed using solid-phase techniques well known in the art. Synthesized polypeptides 

1 5 can be substantially purified by high performance liquid chromatography (HPLC) techniques, 
and the composition of the purified polypeptide confirmed by amino acid sequencing. One of 
skill in the art would recognize that variants and homologs of the MttAl, MttA2, MttB and 
MttC polypeptide sequences can be produced by manipulating the polypeptide sequence 
during and/or after its synthesis. 

20 MttAl, MttA2, MttB and MttC and their functional variants and homologs can also be 

produced by an expression system. Expression of MttAl, MttA2, MttB and MttC may be 
accomplished by inserting the nucleotide sequence encoding MttAl, MttA2, MttB and MttC 
(Figures 1 1 and 7), its variants, portions, or homologs into appropriate vectors to create 
expression vectors, and transfecting the expression vectors into host cells. 

25 Expression vectors can be constructed using techniques well known in the art 

[Sambrook et ah (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor 
Press, Plainview NY; Ausubel et al (1989) Current Protocols in Molecular Biology, John 
Wiley & Sons, New York NY]. Briefly, the nucleic acid sequence of interest is placed in 
operable combination with transcription and translation regulatory sequences. Regulatory 

30 sequences include initiation signals such as start {i.e., ATG) and stop codons, promoters 
which may be constitutive {i.e., continuously active) or inducible, as well as enhancers to 
increase the efficiency of expression, and transcription termination signals. Transcription 
termination signals must be provided downstream from the structural gene if the termination 
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signals of the structural gene are not included in the expression vector. Expression vectors 
may become integrated into the genome of the host cell into which they are introduced, or are 
present as unintegrated vectors. Typically, unintegrated vectors are transiently expressed and 
regulated for several hours (eg. , 72 hours) after transfection. 
5 The choice of promoter is governed by the type of host cell to be transfected with the 

expression vector. Host cells include bacterial, yeast, plant, insect, and mammalian cells. 
Transfected cells may be identified by any of a number of marker genes. These include 
antibiotic (e.g., gentamicin, penicillin, and kanamycin) resistance genes as well as marker or 
reporter genes (e.g., P-galactosidase and luciferase) which catalyze the synthesis of a visible 

10 reaction product. 

Expression of the gene of interest by transfected cells may be detected either indirectly 
using reporter genes, or directly by detecting mRNA or protein encoded by the gene of 
interest. Indirect detection of expression may be achieved by placing a reporter gene in 
tandem with the sequence encoding one or more of MttAl, MttA2, MttB and MttC under the 

1 5 control of a single promoter. Expression of the reporter gene indicates expression of the 

tandem one or more MttAl, MttA2, MttB and MttC sequence. It is preferred that the reporter 
gene have a visible reaction product. For example, cells expressing the reporter gene 
P-galactosidase produce a blue color when grown in the presence of X-Gal, whereas cells 
grown in medium containing luciferin will fluoresce when expressing the reporter gene 

20 luciferase. 

Direct detection of MttAl, MttA2, MttB and MttC expression can be achieved using 
methods well known to those skilled in the art. For example, mRNA isolated from 
transfected cells can be hybridized to labelled oligonucleotide probes and the hybridization 
detected. Alternatively, polyclonal or monoclonal antibodies specific for MttAl, MttA2, 

25 MttB and MttC can be used to detect expression of the MttAl, MttA2, MttB and MttC 

polypeptide using enzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA) 
and fluorescent activated cell sorting (FACS). 

Those skilled in the art recognize that the MttAl, MttA2, MttB and MttC polypeptide 
sequences of the present invention are useful in generating antibodies which find use in 

30 detecting cells that express MttAl, MttA2, MttB and MttC or proteins homologous thereto. 

Such detection is useful in the choice of host cells which may be used to target recombinant 
twin arginine containing protein expression to cellular membranes or to the periplasm or to 
the extracellular medium. Additionally, such detection is particularly useful in selecting host 
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cells for cytoplasmic or extracellular expression of recombinant twin arginine containing 
proteins by disrupting the function of at least one of MttAl, MttA2, MttB and MttC as 
described below. 

5 C. Methods for expressing polypeptides to produce soluble proteins 

This invention contemplates methods for targeting expression (e.g., to the periplasm, 
extracellular medium) of any gene of interest (e.g., to the cytoplasm, extracellular medium) 
thus reducing the likelihood of expression of insoluble recombinant polypeptides, e.g., in 
inclusion bodies. The methods of the invention are premised on the discovery of three 

10 proteins, MttAl, MttA2, MttB and MttC which function as part of a Sec-independent 

pathway, and which target expression of twin arginine containing proteins to cell membranes 
and which direct translocation of such proteins to the periplasm of gram negative bacteria and 
to the extracellular medium of cells which do not contain a periplasm. This discovery makes 
possible methods for expression of any gene of interest such that the expressed polypeptide is 

1 5 targeted to the periplasm or extracellular medium thereby allowing its expression in a soluble 
form and thus facilitating its purification. The methods of the invention contemplate 
expression of any recombinant polypeptide as a fusion polypeptide with a twin-arginine signal 
amino acid sequence as the fusion partner. Such expression may be accomplished by 
introducing a nucleic acid sequence which encodes the fusion polypeptide into a host cell 

20 which expresses wild- type MttAl, MttA2, MttB or MttC, or variants or homologs thereof, or 
which is engineered to express MttAl, MttA2, MttB or MttC, or variants or homologs 
thereof. While expressly contemplating the use of the methods of the invention for the 
expression of any polypeptide of interest, the methods disclosed herein are particularly useful 
for the expression of cofactor-containing proteins. The methods of the invention are further 

25 described under (i) Cofactor-containing proteins, (ii) Expression of fusion proteins containing 
twin-arginine signal amino acid sequences, and (iii) Construction of host cells containing 
deletions or mutations in at least a portion of the genes mttAl, MttA2, mttB and mttC. 

i- Cofactor-containing proteins 

30 A strong correlation has been reported between possession of a twin-arginine signal 

amino acid sequence in the preprotein and the presence of a redox cofactor in the mature 
protein; approximately 40 out of 135 preprotein amino acid sequences which contain a twin- 
arginine signal amino acid sequence have been found by Berks [Berks (1996) Molecular 
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Microbiology 22 393-104; http://www.blackwell-science.com/products/journals/ 
contents/berks. htm] to result in a mature protein which binds, or can be inferred to bind, a 
redox cofactor. The entire contents of Berks are hereby expressly incorporated by reference. 
The cofactors associated with a twin-arginine signal amino acid sequence include, but 
5 are not limited to, iron-sulfur clusters, at least two variants of the molybdopterin cofactor, 

certain polynuclear copper sites, the tryptophan tryptophylquinone (TTQ) cofactor, and flavin 
adenine dinucleotide (FAD). A representative selection of bacterial twin-arginine signal 
amino acid sequences is shown in Table 1. 



1 0 TABLE 1 





Evidence 


Length 


I. PERIPLASMIC PROTEINS BINDING IRON-SULFUR CLUSTERS 


A. MauM family ferredoxins 


P. denitr if leans 


MauM 


MEARMTGRRKVTRRDAMADAARAVGVACLG 
GFSLAALVRTASPVDA 


VH 


46 


E. coli 


NapG 


MSRSAKPQNGRRRFLRDVVRTAGGLAAVGVA 
LGLQQQTARA 


VH 


41 


B. M6Fe' ferredoxin superfamiiy 


E. coli 


NrfC 


MTWSRRQFLTGVGVLAAVSGTAGRVVA 


VH 


27 


D. vulgaris 


Hmc2 


MDRRRFLTLLGSAGLTATVATAGTAKA 


VH 


27 


C. High potential iron protein (HiPIP) 


T. ferrooxidans 


Iro 


MSEKDKMITRRDALRNIAVVVGSVATTTMMG 
VGVADA 


EX 


37 


D. Periplasmically-located (Fe| hydrogenase small subunits 


D. vulgaris 


HydB 


MQIVNLTRRGFLKAACVVTGGALISIRMTGKA 
VA 


VH 


34 


E. Periplasmically-located [NiFe] hydrogenase small subunits 


E. coli 


HyaA 


MNNEETFYQAMRRQGVTRRSFLKYCSLAATS 
LGLGAGMAPKIAWA 


EX 


45 


+M. mazei 


VhoG 


MSTGTTNLVRTLDSMDFLKMDRRTFMKAVSA 
LGATAFLGTYQTEIVNA 


EX 


48 


D. gigas 


HynB 


MKCYIGRGKNQVEERLERRGVSRRDFMKFCT 
AVAVAMGMGPAFAPKVAEA 


EX 


50 


E. coli 


HybA 


MNRRNFIKAASCGALLTGALPSVSHA 


VH 


26 


F. Membrane-anchored Rieske proteins 


P. denitr ificans 


FbcF 


MSHADEHAGDHGATRRDFLYYATAGAGTVA 
AGAAAWTLVNQMNP 
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Evidence 


Length 


+Synechocystis 


PetC 


MTQISGSPDVPDLGRRQFMTvlLLTFGTITGVAA 
GALYPAVKYLIP 






+S. acidocaldarius 


SoxF 


MDRRTFLRLYLLVGAAIAVAPVIKPALDYVGY 






11. PERIPLASMIC PROTEINS BINDING THE MOLYBDOPTERIN GOFACTOR 


A. IMolybdopterin guanine dinucleotide-binding proteins, 
some of which also bind an iron-sulfur cluster 


R. sphaeroides 


DmsA 


MTKLSGQELHAELSRRAFLSYTAAVGALGLCG 
TSLLAQGARA 


EX 


42 


E. coli 


BisZ 


MTLTRREFIKHSGIAAGALVVTSAAPLPAWA 


VH 


31 


T. pantotropha 


Nap A 


MTISRRDLLKAQAAGIAAMAANIPLSSQAPA 


VH 


31 


W. succinogenes 


FdhA 


MSEALSGRGNDRRKFLKMSALAGVAGVSQAV 
G 


EX 


32 


E. coli 


DmsA 


M KTKIPD A VLA AE VSRRG L VKTTA IGG LA M A S 
SALTLPFSRIAHA 


EX 


45 


H. influenzae 


DmsA 


MSNFNQISRRDFVKASSAGAALAVSNLTLPFN 
VMA 


VH 


35 


S. typhimurium 


PhsA 


MSISRRSFLQGVGIGCSACALGAFPPGALA 


VH 


30 


B. IMolybdopterin cytosine dinucleotide-binding proteins 


P. di mi ma a 


lorB 


MKTVLPSVPETVRLSRRGFLVQAGTITCSVAFG 
SVPA 


VH 


37 


A. polyoxogenes 


Aid 


M G RLNRFRLGKDG RREQ A S LS RRG FL VTS LG A 
GVMFGFARPSSA 


EX 


44 


III. PERIPLASMIC ENZYMES WITH POLYNUCLEAR COPPER SITES 


A. Nitrous oxide reductases 


P. stutzeri 


NosZ 


MSDKDSKNTPQVPEKLGLSRRGFLGASAVTGA 
AVAATALGGAVMTRESWA 


EX 


50 


B. Multicopper oxidase superfamily 


P. syringae 


CopA 


MESRTSRRTFVKGLAAAGVLGGLGLWRSPSW 
A 


VH 


32 


E. coli 


Sufl 


MSLSRRQFIQASGIALCAGAVPLKASA 


VH 


27 


IV. METHYLA MINE DEHYDROGENASE SMALL SUBUNITS (TRYPTOPHAN 
TRYPTOPHYLQUINONE COFACTOR) 


M. extorquem 


MauA 


MLGKSQFDDLFEKMSRKVAGHTSRRGFIGRVG 
TAVAGVALVPLLPVDRRGRVSRANA 


EX 


57 


V. PERIPLASMIC PROTEINS BINDING FLAVIN ADENINE DI NUCLEOTIDE 


C. vinosum 


FccB 


MTLNRRDFIKTSGAAVAAVGILGFPHLAFG 


EX 


30 


+B. sterol icum 


ChoB 


MTDSRANRADATRGVASVSRRRFLAGAGLTA 
GAIALSSMSTSASA 


EX 


45 
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A more complete listing of bacterial twin-arginine signal amino acid sequences is 
available at http://www.blackwell-sciencexom/products/journals/mole.htm, the entire contents 
of which are incorporated by reference. Amino acids with identity to the most preferred 
5 (S/T)-RR-x-F-L-K consensus motif are indicated in bold. Signal sequences are from 

Proteobacterial preproteins except where indicated (+). 'Evidence' indicates the method used 
to determine the site of protease processing: EX, experimentally determined; VH, inferred 
using the algorithm of von Heijne (1987). [1] van der Pal en et al (1995); [2] Richterich et 
al (1993); [3] Hussain et al (1994); [4] Rossi et al (1993); [5] Kusano et al (1992); [6] 
10 Voordouw et al (1989); [7] Menon et al (1990); [8] Deppenmeier et al (1995); [9] Li et al 

(1987) ; [10] Menon et al (1994); [11] Kurowski and Ludwig (1987); [12] Mayes and Barber 
(1991); [13] Castresana et al (1995); [14] Hilton and Rajagopalan (1996); [15] Campbell and 
Campbell (1996); [16] Berks et al (1995a); [17] Bokranz et al (1991); [18] Bilous et al 

(1988) ; [19] Fleischmann et al (1995); [20] Heinzinger et al (1995); [21] Lehmann et al 
15 (1995); [22] Tamaki et al (1989); [23] Viebrock and Zumft (1988); [24] Mellano and 

Cooksey (1988); [25] Plunkett (1995); [26] Chistoserdov and Lidstrom (1991); [27] Dolata et 

al (1993); [28] Ohta et al (1991). 

In contrast to twin-arginine signal amino acid sequences, Sec signal sequences are 

associated with periplasmic proteins binding other redox cofactors, /. e. , iron porphyrins 
20 (including the cytochromes c), mononuclear type I or II copper centers, the dinuclear Cu A 

center, and the pyrrolo-quinoline quinone (PQQ) cofactor. 

Currently the assembly of cofactor-containing proteins is limited to the cytoplasm 

because the machinery to insert the cofactor is located in this compartment. The present 

invention offers the advantage of providing methods for periplasmic and extracellular 
25 expression of cofactor-containing proteins which contain a twin-arginine signal amino acid 

sequence, thus facilitating their purification in a functional and soluble form. 

ii. Expression of fusion proteins containing twin-arginine signal amino acid 
sequences 

30 The methods of the invention exploit the inventors' discovery of proteins MttAl, 

MttA2, MttB and MttC which are involved in targeting expression of proteins which contain 
a twin-arginine amino acid signal sequence to cell membranes and in translocation of such 
proteins to the periplasm of gram negative bacteria and the extracellular medium of cell that 
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do not contain a periplasm. The term "twin-arginine signal amino acid sequence" as used 
herein means an amino acid sequence of between 2 and about 200 amino acids, more 
preferably between about 10 and about 100 amino acids, and most preferably between about 
25 and about 60 amino acids, and which comprises the amino acid sequence, from the N- 
5 terminal to the C-terminal, A-B-C-D-E-F-G, wherein the amino acid at position B is Arg, and 
the amino acid at position C is Arg. The amino acid at positions A, D, E, F, and G can be 
any amino acid. However, the amino acid at position A preferably is Gly, more preferably is 
Glu, yet more preferably is Tlir, and most preferably is Ser. The amino acid at position D 
preferably is Gin, more preferably is Gly, yet more preferably is Asp, and most preferably is 

10 Ser. The amino acid at position E preferably is Leu and more preferably is Phe. The amino 
acid at position F preferably is Val, more preferably is Met, yet more preferably is He, and 
most preferably is Leu. The amino acid at position G preferably is Gin, more preferably is 
Gly and most preferably is Lys. In one preferred embodiment, the twin-arginine amino acid 
signal sequence is Ser-Arg-Arg-Ser-Phe-Leu-Lys (SEQ ID NO:41). In yet another preferred 

1 5 embodiment, the twin-arginine amino acid signal sequence is Thr-Arg-Arg-Ser-Phe-Leu-Lys 
(SEQ ID NO:42). 

The invention contemplates expression of wild-type polypeptide sequences which 
contain a twin-arginine amino acid signal sequence as part of a preprotein. To date, 135 
polypeptide sequences have been reported to contain a twin-arginine amino acid signal 
20 sequence motif [Berks (1996) Molecular Microbiology 22 393-104; http://www.blackwell- 

science.com/products/journals/contents/berks.htm the entire contents of which are incorporated 
by reference]. 

The invention further contemplates expression of recombinant polypeptide sequences 
which are engineered to contain a twin-arginine amino acid signal sequence as part of a 

25 fusion protein. Fusion protein containing one or more twin-arginine amino acid signal 

sequences may be made using methods well known in the art. For example, one of skill in 
the art knows that nucleic acid sequences which encode a twin-arginine amino acid signal 
sequence may be operably ligated in frame (directly, or indirectly in the presence of 
intervening nucleic acid sequences) to a nucleotide sequence which encodes a polypeptide of 

30 interest. The ligated nucleotide sequence may then be inserted in an expression vector which 
is introduced into a host cell for expression of a fusion protein containing the polypeptide of 
interest and the twin-arginine amino acid signal sequence. 
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Fusion proteins containing twin-arginine amino acid signal sequences are expected to 
be targeted to the periplasm or extracellular medium by the MttAl, MttA2, MttB and MttC 
proteins of the invention and by variants and homologs thereof; Keon and Voordouw [Keon 
and Voordouw (1996) Anaerobe 2:231-238] have reported that a fusion protein containing E. 
5 coli alkaline phosphatase (phoA) linked to a signal amino acid sequence from the Hmc 
complex of Desulfovibrio vulgaris subsp. vulgaris was exported to E. coli periplasm. 
Similarly, a fusion protein containing a hydrogenase signal peptide to (3-lactamase from which 
the signal peptide had been removed led to export in E. coli under both aerobic and anaerobic 
conditions [Niviere et al. (1992) J. Gen. Microbiol. 138:2173-2183]. 

10 Fusion proteins which contain twin-arginine amino acid signal sequences are also 

expected to be cleaved to generate a mature protein from which the twin-arginine amino acid 
signal sequences has been cleaved. Two signal peptidases have so far been identified in E. 
coli: Signal peptidase I and signal peptidase II. The signal peptidase II which has a unique 
cleavage site involving a cystine residue at the cleavage site [Bishop et al. (1995) J. Biol. 

15 Chem. 270:23097-23103] is believed not to participate in cleavage of twin-arginine amino 
acid signal sequences. Rather, signal peptidase I, which cleaves Sec signal sequences has 
been suggested by Berks to cleave twin-arginine amino acid signal sequences. Berks also 
suggested that signal peptidase I has the same recognition site in Sec signal sequences as in 
twin-arginine amino acid signal sequences [Berks (1996)]. This suggestion was based on (a) 

20 the "-1/-3" rule for Sec signal peptidase in which the major determinant of signal peptidase 

processing is the presence of amino acids with small neutral side-chains at positions -1 and -3 
relative to the site of cleavage, and (b) the good agreement between the cleavage site of twin- 
arginine amino acid signal sequences as determined using the "-1/-3" rule (with the invariant 
arginine at the N-terminus of the signal sequence, i.e., position B in the A-B-C-D-E-F-G 

25 sequence, designated as position zero) and the experimentally determined amino terminus of 
the mature protein [Berks (1996)]. Evidence presented herein (Example 9) further confirms 
cleavage of twin-arginine amino acid signal sequences to release a mature protein which lacks 
the twin-arginine amino acid signal sequence. 



30 Hi. Construction of host cells containing deletions or mutations in at least a 

portion of the genes mttA, mttB and mttC 

The function of any portion of E. coli MttAl, MttA2, MttB and MttC 
polypeptides and variants and homologs thereof, as well as the function of any polypeptide 
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which is encoded by a nucleotide sequence that is a variant or homolog of the mttAl, MttA2, 
mttB and mttC sequences disclosed herein may be demonstrated in any host cell by in vivo 
homologous recombination of chromosomal sequences which are variants or homologs of 
mttAl, MttAl, mttB and mttC using previously described methods [Sambasivarao et al (1991) 
5 J. Bacteriol. 5935-5943; Jasin et al (1984) J. Bacterid. 159:783-786]. Briefly, the nucleotide 
sequence whose function is to be determined is cloned into vectors, and the gene is mutated. 
e.g., by insertion of a nucleotide sequence within the coding region of the gene. The 
plasmids are then homologously recombined with chromosomal variants or homologs of 
mttAl, MttA2, mttB or mttC sequences in order to replace the chromosomal variants or 

10 homologs of mttAl, MttA2, mttB or mttC genes with the mutated genes of the vectors. The 
effect of the mutations on the localization of proteins containing twin-arginine amino acid 
signal sequences is compared between the wild-type host cells and the cells containing the 
mutated mttAl, MttA2, mttB or mttC genes. The localization {e.g., cytoplasm, periplasm, cell 
membranes, extracellular medium) of expressed twin arginine containing proteins is compared 

15 using methods disclosed herein {e.g., functional enzyme activity and Western blotting) 

between homologously recombined cells and control cells which had not been homologously 
recombined. Localization of expressed twin arginine containing proteins extracellularly, in 
the periplasm, or in the cytoplasm of homologously recombined cells as compared to 
localization of expression in cell membranes of control cells demonstrates that the wild-type 

20 MttAl, MttA2. MttB or MttC protein whose function had been modified by homologous 

recombination functions in targeting expression of the twin arginine containing protein to the 
cell membrane. Similarly, accumulation of expressed twin arginine containing proteins in 
extracellular medium, in the cytoplasm, or in cell membranes of homologously recombined 
cells as compared to periplasmic localization of the expressed twin arginine containing protein 

25 in control cells which had not been homologously recombined indicates that the protein {i.e.. 
MttAl., MttA2, MttB or MttC) whose function had been modified by homologous 
recombination functions in translocation of the twin arginine containing protein to the 
periplasm. 

30 EXPERIMENTAL 

The following examples serve to illustrate certain preferred embodiments and aspects 
of the present invention and are not to be construed as limiting the scope thereof. The strains 
and plasmids used in this investigation are listed in Table 2. 
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TABLE 2 

Bacteria and Plasmids used in this Investigation 



Strain/Plasm id 


Genotype or 
Gene Combinations Present 


Referen c e/S o u rce 


i id i n i 

I ID 1 U 1 


F-, hsdS20(r- B m- B ), leu, supE44, aral4,galK2, 
lacYl, proA2, rpsL20 y xyl-5, mtl-1, recA13, mcrB 


Boyer and Roulland- 
Dussoix, 1969 




iv i AL\(iac-proj sup rLr traujo pvoAn tact 
AlacZM15 


Amersham Corp. 


D43 


HB101; mttA 


Bilous and Weiner, 1985 


pBR322 


cloning vector Tef, Amp 1 


Pharmacia 


p 1 Z, 1 olx 


cloning vector Amp r , lacZ 


Pharmacia 


pJBS633 


blaM fusion vector 


Broome-Smith and \ 
opratt, iVoo 


pr l\LJo4 


p^ct/LrSLsD cioneu into poKJZz 


Lemire 6?/ al.^ lvoz 


pr iviJ J i / 


A/raCiJ version oi prKDo4 


Lemire et aL* 1982 


pL'lVl o J OU 


umsAJcSL, cjonea into pr>K_$zz 


Rothery and Weiner, 
1991 


pDMS223 


dmsABC operon in pTZ18R 


Rothery and Weiner, 
1991 


pDMSL71 


dmsABC:: blaM in pJBS633 fusion after residue 12 


Weiner et al, 1993 


pDMSL5 


dmsABC:: blaM in pJBS633 fusion after residue 216 


Weiner et ah, 1993 


pDMSL29 


dmsABC:: blaM in pJBS633 fusion after residue 229 


Weiner et aL. 1993 


P U IVl O L^H 


dmsABC :: blaM in pJBS633 fusion after residue 267 


Weinei et a/.. I S>vj> 


pDMSC59X 


dmsC truncate after residue 59 


Sambasivarao and 
weinei, \yy\ 


plJoi\j 1 1 


yig{j,r, k, i ana cy in pr>KJzz 


This investigation 


dGS20 


h383V b^^7 and in nRRl?? 

L» JOJ J , UJOJU, l/JOJ /, til 1 LI U J O J O 111 L J U 1\ JZ,Z. 


11115* III VCbLl^dlUJI 1 


pTZmttABC 


region of ORF's b3836, b3838, yigU, yigW, cloned 
intopTZ18R 


This investigation 


pBRmttABC 


region of ORF's b3836, b3838, yigU, yigW, cloned 
into pBR322 


This investigation 


pTZb3836 


ORF b3836 cloned into pTZ18R 


This investigation 


pBRb3836 


ORF b3836 cloned into pBR322 


This investigation 
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EXAMPLE 1 

Isolation And Properties of D-43 Mutants Defective In DmsABC Targeting 



DMSO reductase is a "twin arginine" trimeric enzyme composed of an extrinsic 
5 membrane dimer with catalytic, DmsA, and electron transfer, DmsB, subunits bound to an 
intrinsic anchor subunit, DmsC. The DmsA subunit has a "twin arginine" leader but it has 
been exhaustively shown that the DmsA and DmsB subunits face the cytoplasm [Rothery and 
Weiner (1996) Biochem. 35:3247-3257; Rothery and Weiner (1993) Biochem. 32:5855-5861; 
Sambasivarao et al (1990) J. Bacteriol. 172:5938-5948; Weiner et al (1992) Biochem. 
10 Biophys. Acta 1102:1-18; Weiner et al (1993) J. Biol. Chem. 268:3238-3244]. 

In order to isolate a E. coli mutant defective in membrane targeting of DmsABC, 
plieotropic mutants which were unable to grow on DMSO were produced by nitrosoguanidine 
mutagenesis of HB101 and the growth rates on DMSO of both the mutants and HB101 were 
determined. Mutant D-43, which grew anaerobically on fumarate and nitrate, nevertheless 
1 5 failed to grow on DMSO or TMAO. These results are further described in the following 
sections. 

A. Isolation of mutant 

Nitrosoguanidine mutagenesis and ampicillin enrichment were as described by Miller 
20 (1992) in A Short Course in Bacterial Genetics, Cold Spring Harbor Laboratory Press. 

Sixteen mutants were isolated that were defective for anaerobic growth on DMSO but grew 
with nitrate or fumarate as the alternate electron acceptor. Each of the mutants was 
transformed with pDMS160 [Rothery and Weiner (1991) Biochem. 30:8296-8305] carrying 
the entire dms operon and again tested for growth on DMSO. All of the transformants failed 
25 to grow on DMSO. When tested for DMSO reductase activity 14 of the 16 transformants 
lacked measurable enzyme activity. Two of the mutants expressed high levels of DMSO 
reductase activity but the activity was localized in the cytoplasm rather than the membrane 
fraction. One of these mutants, D-43, was chosen for further study. 



30 B. Anaerobic growth rates of HB101 and D-43 

For growth experiments, bacteria were initially grown aerobically overnight at 37°C in 
LB plus 10 jLig/ml" 1 vitamin Bl. A 1% inoculum was added to 150 ml of minimal salts 
medium containing 0.8% (w/v) glycerol, 10 jag/ml" 1 each of proline, leucine, vitamin Bl and 
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0.15% peptone and supplemented with either DMSO 70 mM, fumarate 35 mM, nitrate 40 
mM, or trimethylamine N-oxide (TMAO) lOOmM. Cultures were grown anaerobically at 
37°C in Klett flasks and the turbidity monitored in a Klett spectrophotometer with a No. 66 



filter. 



5 



The rates of anaerobic growth of strains HB101 and D-43 with a range of electron 
acceptors and a nonfermentable carbon source, glycerol, were compared. The results are 
shown in Figure 1. 



10 



All the terminal electron acceptors tested supported the growth of the parent HB101 
(Figure la). In contrast, only nitrate and fumarate stimulated the growth rate of the mutant 
(Figure lb). However, even in the presence of nitrate and fumarate the growth yield was half 



that of strain HB101. The reduced growth rate may reflect the pleiotropic effects of the 
mutation of various metabolic reactions needed for optimal growth in addition to the terminal 
electron transfer reaction. Only DmsABC supports growth on DMSO whereas both DmsABC 
and the periplasmic TMAO reductase support growth on TMAO [Sambasivarao and Weiner 
15 (1991) J. Bacteriol. 173:5935-5943]. The observation that D-43 is unable to grow on either 
DMSO or TMAO indicates that both of these enzymes were non- functional. 



Previous studies have exhaustively shown that DmsABC is localized on the 
cytoplasmic membrane of wild-type E. coli strains with the DmsAB subunits anchored to the 
cytoplasmic surface [Rothery and Weiner (1996) Biochem. 35:3247-3257; Rothery and 
Weiner (1993) Biochem. 32:5855-5861; Sambasivarao et al (1990) J. Bacteriol. 172:5938- 
25 5948; Weiner et al (1992) Biochem. Biophys. Acta 1102:1-18; Weiner et al (1993) J. Biol. 
Chem. 268:3238-3244]. In order to determine he localization of DmsABC in D-43 mutants, 
cell fractions were assayed for the presence of DmsA and DmsB by immunoblot analysis, and 
for DMSO reductase activity as follows. 

30 A. Functional enzyme activity assays 

Cell fractions were assayed for DMSO reductase activity by measuring the DMSO- 
dependent oxidation of reduced benzyl viologen at 23°C [Bilous and Weiner (1985) J. 
Bacteriol. 162:1151-1 155]. This assay is dependent only on the presence of DmsAB. 



EXAMPLE 2 



DmsA Is Not Anchored To the Membrane In D-43 



20 
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To test the localization of DmsABC in D-43, enzyme activity in the soluble fraction 
and membrane band fraction of HB101/pDMS160 and of D-43/pDMS160 was determined. 
250 ml anaerobic cultures of HB101/pDMS160 and D-43/pDMS 1 60 were grown on Gly/Fum 
medium. HB101/pDMS160 yielded 114 mg total protein, 3240 units of membrane-bound 
5 TMAO reductase activity, and 2900 units of soluble activity. D-43/pDMS160 yielded 99 mg 
total protein, 320 units were membrane-bound and 4000 units were soluble. Thus, although 
the total DmsABC activity was lower in D-43, (4300 total units compared to 6200 for 
HB101/pDMS160) the vast majority was not targeted to the membrane. This suggested that 
D-43 was defective in targeting to the membrane rather than in a biosynthetic step. 

10 

B. Western blot analysis of DmsA and DmsB 

To determine the cellular locations of DmsA and DmsB by Western blots. D- 
43/pDMS160 and HB101/pDMS160 were grown anaerobically on Gly/fumerate medium at 
37°C in 19 I batches [Bilous and Weiner (1985) J. Bacteriol. 162:1151-1155]. Cultures were 

15 grown for 24hr, at 37°C and the cells harvested and membranes prepared by French pressure 
cell lysis at 16,000 psi followed by differential centrifugation as previously described 
[Rothery and Weiner (1991) Biochem. 30:8296-8305]. The crude membranes were washed 
twice with lysis buffer (50 mM MOPS, 5 mM EDTA pH 7.0). DmsABC was purified as 
described by Simala-Grant and Weiner (1996) Microbiology 142:3231-3229. For the 

20 determination of subunit anchoring to the membrane, membrane preparations were first 

washed with lysis buffer and then with lysis buffer containing 1 M NaCI. The osmotic shock 
procedure of Weiner and Heppel (1971) J. Biol. Chem. 246:6933-6941) was used to isolate 
the periplasmic fraction tested for fumarate and DMSO reductase polypeptides. 

For Western blot analysis, antibodies to purified DmsA and DmsB were used 

25 [Sambasivarao et ah (1990) J. Bacteriol. 172:5938-5948]. Typically, samples were separated 
on 10% (w/v) SDS-PAGE and then blotted onto nitrocellulose. The protein bands were 
detected using the enhanced chemiluminescence detection system from Amersham and goat 
anti-rabbit lgG (H+L) horseradish peroxidase conjugate. The results are shown in Figure 2. 
Figure 2 shows a Western blot of washed membranes and soluble fractions of HB101 

30 and D-43 harboring pDMS160 expressing DmsABC. The blot was probed with either 
purified anti-DmsA or anti-DmsB. S; soluble fraction, M; Washed membranes, sM; salt 
washed membranes, sS; soluble fraction from the salt washed membranes, P; purified 
DmsABC. Figure 2 clearly shows that DmsA is not targeted to the membrane in D-43. The 



- 33 - 



10 



WO 99/51 753 PCT/CA99/00272 

DmsA polypeptide was expressed and was present in the cytoplasm at levels equivalent to the 
wild-type. Equivalent samples probed with anti-DmsB demonstrated that significant amounts 
of DmsB were targeted to the membrane. Membrane incorporation of DmsC in the absence 
of DmsAB is lethal [Turner et al (1997) Prof. Engineering 10:285-290] and the presence of 
DmsB on the membrane may overcome the lethality normally associated with incorporation of 
DmsC in the absence of the catalytic subunits. 

EXAMPLE 3 
DmsC Is Anchored To the Membrane In D-43 



Because polyclonal antibodies against DmsC could not successfully be raised 
[Sambasivarao et al (1990) J. Bacteriol. 172:5938-5948; Turner et al (1997) Prof. 
Engineering 10:285-290], three BlaM (p-lactamase) fusions were used to determine whether 
the anchor subunit is translated and correctly inserted into the membranes of D-43 [Weiner et 

15 al (1993) J. Biol. Chem. 268:3238-3244]. These fusions were located after amino acid 
positions 216, 229 and 267 of DmsC. Fusion 216 was localized to the periplasm and 
mediated very high resistance. Fusions 229 and 267 were localized to the seventh and eighth 
transmembrane helices and mediated intermediate levels of resistance [Weiner et al (1993) J. 
Biol. Chem. 268:3238-3244]. The minimal inhibitory concentrations of ampicillin. for each 

20 of these fusions expressed in D-43 under anaerobic growth conditions, were the same or 

within one plate dilution of the wild-type values. Additionally, Western blots, using antibody 
directed against BlaM, of cell fractions of membrane, cytoplasmic and osmotic shock fluids of 
D-43/pDMSL29 (fusion at amino acid 229) showed DmsC-BlaM in the membrane fractions 
(results not shown). These data suggest that the DmsC protein is translated and inserted into 

25 the membrane and has the same topology as that found in wild-type E. coli cells. 

EXAMPLE 4 

Enzyme Activity Of Nitrate Reductase and Trimethylamine N-Oxide Reductase With A 
Twin Arginine Signal Sequence Is Not Targeted To the Periplasm Of D-43 While 
30 Enzyme Activity of Nitrite Reductase With A Sec-Signal Sequence Is Present In the 

Periplasm Of D-43 



In order to determine whether the mutation in D-43 (which resulted in failure to 
anchor DmsA and DmsB to the cell membrane as described above) selectively prevented 
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membrane targeting of proteins with a twin-arginine signal amino acid sequence, the enzyme 
activity of periplasmic enzymes having a twin-arginine signal amino acid sequence (/.<?., 
nitrate reductase (NapA) and trimethylamine N-oxide reductase (TorA)) and of a periplasmic 
enzyme having a Sec-leader sequence (i.e., nitrite reductase (NrfA)) was determined in the 
5 periplasm of D-43 and HB101. 

E. coli can reduce nitrate to ammonia using two periplasmic electron transfer chains, 
the Nap and Nrf pathways [Grove et al (1996) MoL Microbiol. 19:467-481; Cole (1996) 
FEMS Microbiol. Letts. 136:1-11]. The catalytic subunit of the periplasmic nitrate reductase, 
NapA. is a large molybdoprotein with similarity to DmsA and is synthesized with a twin- 

10 arginine signal amino acid sequence. NrfA, the periplasmic nitrite reductase, is not a 

molybdoprotein but a c-type cytochrome and contains a Sec-leader peptide. Accumulation of 
both of these redox enzymes in the periplasm of strain D-43 was assayed by staining the 
periplasmic proteins separated by PAGE with reduced methyl viologen in the presence of 
nitrate and nitrite as follows. 

15 Periplasmic proteins were released from washed bacterial suspensions as described by 

McEwan et al (1984) Arch. Microbiol. 137:344-349 except that the EDTA concentration was 
5 mM. The periplasmic fraction was dialyzed against two changes of a 20-fold excess of 10 
mM Na+/K+ phosphate, pH 7.4 to remove sucrose and excess salt, freeze dried and dissolved 
in 10 mM phosphate pH 7.4 to a protein concentration of about 15 mg/ml" 1 . Protein 

20 concentrations were determined by the Folin phenol method described previously [Newman 
and Cole (1978) J. Gen. Microbiol. 106:1-12]. The periplasmic proteins were separated on a 
7.5% non-denaturing polyacrylamide gel. After electrophoresis, the 18 cm square gel was 
immersed in 5 \xg ml" 1 methyl viologen containing 5 mM nitrate. Dithionite was added to 
keep the viologen reduced; bands of activity were detected as transparent areas against a dark 

25 purple background. The same protocol was used to detect periplasmic nitrite and TMAO 
reductase activity but 5 mM nitrate was replaced by 2.5 mM nitrite or 5 mM TMAO, 
respectively. The results are shown in Figure 3. 

Figure 3a shows A nitrate-stained polyacrylamide gel containing periplasmic proteins, 
membrane proteins and cytoplasmic proteins from HB101 and D-43. Lanes 1) and 2) contain 

30 periplasmic proteins from HB101 and D-43. respectively. Lanes 3) and 4) contain membrane 
proteins from HB101 and D-43, respectively and lanes 5) and 6) contain soluble cytoplasmic 
proteins from HB101 and D-43, respectively. Figure 3b shows nitrite-stained polyacrylamide 
gel containing periplasmic proteins from 1) HB101 and 2) D-43. Approximately 30 |Lig of 
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protein was loaded into each lane. Figure 3c shows TMAO-stained polyacrylamide gel 
containing periplasmic proteins from 1) HB101 and 2) D-43. 

The results in Figure 3 show that nitrate reductase activity due to NapA was present in 
the periplasmic proteins extracted from the parental strain HB101 but was not observed in 
5 periplasmic proteins prepared from strain D-43 (Figure 3a). In contrast, activity of NrfA, the 
c-type cytochrome nitrite reductase, was similar in periplasmic proteins prepared from both 
HB101 and D-43 (Figure 3b). Significantly, the nitrate reductase activity was higher in 
membranes prepared from strain D-43 than in membranes prepared from the parental strain 
HB101, suggesting that NapA protein was "stuck" in the membrane fraction. No nitrate 
10 reductase activity was detected in soluble cytoplasmic proteins prepared from either strain 
(data not shown). 

Additionally, the rate of electron transfer from physiologic electron donors to NrfA 
was measured by assaying the rate of nitrite reduction by a suspension of whole cells in the 
presence of formate or glycerol. The effects of the mutation on periplasmic nitrite reductase 

15 activity provided a key control to test whether MttA2 plays a major role in protein targeting. 
Nrf activity can be assessed in two ways: by detecting the activity of the terminal nitrite 
reductase which is a c-type cytochrome secreted by the Sec pathway and assembled in the 
periplasm (Figure 3b) [Thony-Meyer and Kunzler (1997) Eur. J. Biochem. 246:794-799], and 
by measuring the rate of nitrite reduction by washed bacteria in the presence of the 

20 physiologic substrate, formate. Only the latter activity requires the membrane-bound iron- 
sulfur protein, NrfC, which is synthesized with an N-terminal twin-arginine signal amino acid 
sequence. 

The rate of nitrite reduction in suspensions of strain HB101 was 34 jamol nitrite 
reduced/min'Vml' 1 while that measured with suspensions of D-43 was 1 1 |amol nitrite 
25 reduced/min'Vml" 1 . These results show that cytochrome c 552 was correctly targeted in the 

mutant and able to catalyse nitrite reduction with dithionite-reduced methyl viologen as the 
artificial electron donor, but strain D-43 was deficient in formate-dependent nitrite reductase 
activity. 

Loss of electron transport to NrfA from physiologic electron donors, but not from 
30 reduced methyl viologen was probably due to the presence of a twin-arginine signal amino 
acid sequence motif in either NrfC, which is a protein essential for the transfer of electrons 
from quinones to NrfA [Hussain et al. (1996) Mol. Microbiol. 12:153-163] or in FdnG which 
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contributes to the transfer of electrons from formate to nitrite [Darwin et al. (1993) J. Gen. 
Microbiol. 139:1829-1840]. 

Trimethylamine N-oxide reductase (TorA) is another periplasmic terminal reductase 
related to DmsA [Mejean et al (1994) Mol. Microbiol. 11:1169-1179] which contains a twin- 
5 arginine signal amino acid sequence. In strain D-43 this enzyme activity was not observed in 
the periplasmic protein fraction (Figure 3c). 



EXAMPLE 5 

MttA2 Protein Targets DmsAB To The Membrane And Does Not Translocate DmsAB 
10 To The Periplasm 



In order to determine whether MttA2 is involved in targeting DmsAB to the 
membrane rather than in the translocation of DmsAB to the periplasm, and whether the role 
of DmsC is to prevent translocation of DmsAB to the periplasm, the intracellular location was 

15 examined in HB101 and D-43 for the DmsA and DmsB subunits expressed from a plasmid 
encoding the wild-type DmsABC operon as well as a truncated form lacking the anchor 
subunit DmsC. The results are shown in Figure 4. 

Figure 4 shows a Western blot of DmsAB. Figure 4A shows HB101 expressing either 
native DmsABC (pDMS160), DmsAB AC (pDMSC59X), or FrdABACD. Figure 4B shows 

20 equivalent lanes as in Figure 4A, with the same plasmids in D-43. P; purified or enriched 
sample protein of either DmsABC or FrdAB, M; washed membranes, S; soluble fraction. O; 
osmotic shock fraction, 20; 2 fold osmotic shock fraction. Purified FrdAB was obtained 
from HB101/pFRD84 expressing high levels of the wild-type enzyme and purified by the 
method of [Dickie and Weiner (1979) Can. J. Biochem. 57:813-821; Lemire and Weiner 

25 (1986) Meth. Enzymol. 126:377-386]. All lanes had the equivalent concentration of protein 
loaded. 

As shown in Figure 4A, (compare lanes 8 and 9 to lanes 4 and 5) significant amounts 
of DmsA and DmsB accumulated in the periplasm only when the DmsC subunit was absent. 
As a control for this experiment, plasmids carrying the intact frdABCD (pFRD84) (not 
30 shown) and truncated frdA B (pFRD117) [Lemire et al (1982) J. BacterioL 152:1126-1 131] 

lacking the anchor subunits of fumarate reductase were also expressed. As fumarate reductase 
does not have a twin-arginine signal amino acid sequence and assembles spontaneously in the 
membrane [Latour and Weiner (1987) J. Gen. Microbiol. 133:597-607] neither a Mtt 
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mutation, nor loss of the anchor subunits, FrdC and FrdD, should result in secretion of FrdAB 
into the periplasm. This was confirmed (lanes 13 and 14). In Figure 4B the same 
experiment is shown for strain D-43. As expected neither DmsA nor DmsB accumulated in 
the periplasm. 

These results demonstrate that MttA is not involved in the translocation of DmsAB to 
the periplasm but in targeting them to the membrane. These results also suggest that the role 
of DmsC is to prevent translocation of DmsAB to the periplasm. 

EXAMPLE 6 

Plasmid Complementation Of D-43 And Sequencing Of The mttA Region 

Complementation of the D-43 mutant with plasmid pDMS160 (which carries the wild- 
type DmsABC operon) was carried out to determine whether the mutation was located within 
or outside the DmsABC structural gene. 

A. Plasmid complementation of mutant D-43 

For initial complementation experiments, an E. coli DNA library was prepared by 
Hindlll digestion of an E. coli HB101 chromosomal DNA preparation and ligated into the 
Hindll site of pBR322. The ligation mixture was transformed directly into D-43. The 
transformants were grown anaerobically on glycerol/DMSO (Gly/DMSO) plates and incubated 
anaerobically at 37°C for 72 hr. The complementing clone identified form this library, 
pDSR3 1 1 , was isolated and restriction mapped. The map was compared with the integrated 
E. coli restriction map version 6 [Berlyn et ah (1996) Edition 9 in Escherichia coli and 
Salmonella 2:1715-1902, ASM Press, Washington DC]. 

A second gene bank was prepared using random 5-7 kb Sau3a fragments of E. coli 
W1485 ligated into the BamHI site of pBR322. This E. coli gene bank was a gift from Dr. 
P. Miller, Parke-Davis Pharmaceuticals, Ann Arbor, MI. D-43 was transformed with 2 |Lig of 
this library and transformants were plated onto Luria-Bertani (LB) broth plates containing 1 00 
jug/ml" 1 ampicillin. After overnight growth at 37°C the cells were washed off the plates into 5 
ml of LB broth and 20 jlxI of this suspension was diluted with 10 ml of Minimal A medium 
[Miller (1992) in A Short Course in Bacterial Genetics, Cold Spring Harbor Laboratory 
Press] containing 100 jig/ml" 1 ampicillin and 10 jug/ml" 1 vitamin Bl, proline and leucine and 
grown aerobically at 37°C for 16 hr. The cells were washed twice in phosphate buffered 
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saline (PBS) and samples were serially diluted into PBS buffer. Each dilution (100 jj,1) was 
plated on Gly/DMSO plates and incubated anaerobically at 37°C for 72 hr. Colonies were 
further tested for anaerobic growth in 9 ml screw-top test tubes containing Gly/DMSO broth 
medium. 

5 The location of the complementing clones in the E. coli chromosome obtained from 

both libraries was confirmed by DNA sequencing the ends of the clones using primers which 
flanked the Hindlll and BamHI sites of pBR322. Subclones of the complementing clones 
from each of the libraries were constructed utilizing standard cloning methods [Sambrook el 
ah (1989)] and ligated into the cloning vector pTZ18R. DNA from subclones was restriction 

10 mapped to verify the insert. Positive subclones were tested for anaerobic growth in 
Gly/DMSO and Gly/Fumarate broth medium. 

A single clone, pDSR311, which allowed growth on Gly/DMSO was identified. 
Through restriction map analysis and sequencing the ends of the insert, the clone was mapped 
to the 88 min region of the chromosome, within contig AE00459 covering the 4,013,851 - 

15 4,022,411 bp region of the sequence of Blattner et al. [Blattner et ah (1997) Science 

277:1453-1462]. The clone contained the previously undefined open reading frames yig(X P, 
R, T, and U (based on the original yig nomenclature for unidentified ORFs) (Figure 5). 

All attempts to use available restriction sites to subclone this region into ORF groups 
yigOP, yigR, yigRTU, and yigTU were unsuccessful. Therefore, a second library consisting of 

20 E. coli chromosomal DNA which had been partially-digested with Sau3a was ligated into 

BamHI- digested pBR322. This library generated a number of complementing clones. The 
smallest was pGS20 which encoded the 3' end of yigR and approximately three quarters of 
yigT as shown in Figure 5. This suggested that the products of the putative genes yigTUW 
were responsible for DmsA targeting to the membrane and Nap translocation to the periplasm 

25 and these genes were renamed mttABC (membrane targeting and translocation). This region 
was cloned from wild-type HB101 utilizing PCR as follows. 

For PCR cloning of the mttABC region, the chromosomal DNA template for PCR was 
prepared from HB101. Bacteria from 1.5 ml of an overnight culture were pelleted in an 
Eppendorf tube and resuspended in 100 \x\ of water. The cells were frozen and thawed three 

30 times, pelleted by centrifugation and 5 \x\ of the supernatant was used as the PCR template. 

The region of the putative mttABC operon was cloned utilizing PCR. The 5' primer 
was located at the end of the coding sequence for j>/#/?(b3835) (position 5559-5573 of contig 
AE00459) and included the intervening sequence between yigR and ma A. The 3' primer 
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hybridized immediately after the stop codon of mttC (position 8090-81 10). The primers 
contained the restriction sites EcoRI and Sail to facilitate cloning into the phagemid pTZ18R 
and recombinants were screened in e. coli strain TGI. The ends of the clones were sequenced 
to verify the region cloned. 
5 Clones of the ORF region mttABC were subcloned utilizing standard cloning methods 

[Sambrook et ah (1989)] and ligated into the vector pBR322. Positive clones and subclones 
were transformed into D-43 and tested for anaerobic growth in Gly/DMSO and Gly/Fumarate 
broth medium. 

The clone of mttABC was able to complement the D-43 mutation only when cloned 
10 into the lower copy number plasmid pBR322 (pBRmttABC) and no complementation (or 

growth) was observed when mttABC was cloned into the high copy number plasmid pTZ18R 
(pTZmttABC). 

The D-43 mutant could not be complemented with plasmid pDMS160 carrying the 
wild-type DmsABC operon suggesting that the mutation mapped outside the structural genes. 
1 5 Interestingly, the mutant expressed nearly normal levels of DMSO reductase activity but the 
activity was soluble rather than membrane-bound. This was surprising given that the 
membrane anchor, DmsC, was expressed in these cells (see below) and this suggested that the 
mutant was defective in membrane targeting or assembly. 

20 B. Sequencing the mttA region 

We compared the sequence of clone pGS20 with the identical region of strain D-43 by 
PCR sequencing of both strands as follows. Chromosomal DNA from strains HB101 and D- 
43 was prepared as above. The 976 bp region which complements the D-43 mutation was 
amplified, the PCR products were sequenced directly and the DNA sequences of both strains 

25 were compared to the published sequence of E. coli [Blattner et ah (1997)]. As Taq DNA 
polymerase was used for PCR, two different reaction products, resulting from separately 
prepared templates, were sequenced to identify any mutations which may have resulted from 
the PCR reaction. Both strands were sequenced in the region of any identified mutations. 
We identified only one nucleotide change altering a C to a T at position 743 of 

30 pGS20. When this region was compared to the sequence of contig AE00459 in the E. coli 
genome sequence [Blattner et al (1997) Science 277:1453-1462], it appeared that the 
mutation mapped within the proposed ORF termed b3837. This ORF did not have a normal 
E. coli codon usage and so we determined the DNA sequence of this region of AE00459. 
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Several differences were identified and a revised ORF map of this contig is shown in Figure 
5. This revision resulted in several changes: ORF b3836, b3837 and b3838 are no longer 
observed and are replaced by a polypeptide which is very similar throughout its length to the 
YigT protein of H.influenzae [Fleischmann et ai (1995) Science 269:496-512] (Figure 6). 
5 Figure 6 shows the sequence (SEQ ID NO:l) of E. coli wild-type MttA aligned with 

YigT of Haemophilus influenzae (Fleischmann et al.^ 1995) (SEQ ID NO:2). The two 
potential transmembrane segments are denoted as TMS1 and TMS2, respectively, a) denotes 
the position of the mutation in MttA which changes proline 25 to leucine, b) denotes the 
termination of MttA in clone pGS20. The potential a-helical region is indicated. 

10 The mutation in D-43 resulted in the mutation of proline 25 of MttA2 to leucine. 

Interestingly, clone pGS20 did not encode the entire MttA polypeptide but terminated at 
amino acid 205. The MttA protein is composed of 277 amino acids and has a mass of -30.6 
kDa. Without limiting the invention to any particular mechanism, the MttA protein has two 
potential transmembrane helices between residues 15-34 and 107-126. The most likely 

15 orientation is with the amino and carboxyl termini exposed to the periplasm. Residues 150 to 
200 are predicted to form a very long a-helix. The mutation in D-43 altered the proline 
immediately after the second transmembrane helix and could disrupt this structure of the 
protein. 

20 C. Proteins homologous to the MttA protein 

A database search of sequences which are related to mttA (i.e., mttA J and mttA 2) 
identified a large family of related proteins whose function was previously unknown. In 
addition to the Zea mays protein of Settles et ai (1997) Science 278:1467-1470. related 
sequences were identified by BLAST searches in Azotobacter chroococcum, Bacillus subtilis, 
25 Heamophilus influenzae, Helicobacter pylori, Mycobacterium leprae, Mycobacterium 

tuberculosis, Pseudomonas stutzerii, Rhodococcus erythropolis, and Synechocystis PCC6803 
as well as the Ybec sequence of E. coli (Figure 8). 
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EXAMPLE 7 
E. coli mttB And mttC Form An Operon With mttA 



A. The mttABC operon 

5 Examination of the DNA sequence adjacent to mttA suggested that the upstream gene, 

yigR, encodes an aminoglycosyl transferase (BLAST search of the non-redundant data base). 
A potential transcription terminator at position 5590-5610 of contig AE00459 [Blattner et al 
(1997) Science 277:1453-1462] separates yigR from mttA. 

To test whether the adjacent genes mttB and mttC form an operon with mttA, mRNA 

10 was isolated from aerobically grown HB101 and RT-PCR was used with a primer within mttC 
to make a cDNA product. This cDNA was then amplified by PCR with primers within mttA 
and mttB giving the expected product of 270 bp., and mttA and mttC giving a product of 
1091 bp. confirming a single polycistronic mRNA for the mttA, mttB. and mttC genes. To 
ensure that the PCR products were not the result of contaminating chromosomal DNA, the 

1 5 mRNA preparation was extensively digested with DNase prior to PCR and a control omitting 
the RT-PCR step did not give any products after PCR amplification. 

The nucleotide sequence (SEQ ID NO:45) of the mttABC operon is shown in Figure 
1 1 . Figure 7 also shows the nucleotide sequence of the three open reading frames, ORF 
RF[3], ORF RF[2] and ORF RF[1], and the encoded amino acid sequences of MttA (SEQ ID 

20 NOT), MttB (SEQ ID NO:7) and MttC (SEQ ID NO:8), respectively. 

B. Proteins homologous to the MttB and MttC proteins 

A database search of sequences which are related to mttB and mttC identified a large 
family of related proteins which are organized contiguously in several organisms. In all cases 

25 the function of these proteins was previously unknown. 

The nucleotide sequence of mttB (SEQ ID NO:)5 is shown in Figure 7. mttB encodes 
an integral membrane protein of 258 amino acids with six predicted transmembrane segments. 
A large number of related sequences was identified in a BLAST search extending from the 
archaebacteria (Archeoglobus fulgidus), through the eubacteria (Azotobacter chroococcum, 

30 Bacillus subtilis, Heamophilus influenzae, Helicobacter pylori, Mycobacterium laprae, 

Mycobacterium tuberculosis), cyanobacteria (Synechocystis PCC6803) to mitochondria of 
algae (Reclimonas americana, Chondrus crispus) and plants (Arabidopsis thalania, 
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Marchantia polymorpha) as well as chloroplasts of Porphyra purpurea and Odentella sinensis 
(Figure 9). 

The nucleotide sequence of the neighboring gene mttC (SEQ ID NO:6) is shown in 
Figure 7. mttC encodes a polypeptide of 264 amino acids which is predicted to have at least 
5 one potential transmembrane segment (residues 24-41). The most likely orientation of this 
protein results in a large cytoplasmic domain extending from residue 41 to 264. Without 
limiting the invention to any particular mechanism, there is the possibility of a second 
transmembrane domain at residues 165-182. This possibility may be confirmed by a hlaM 
gene fusion analysis. Like MttA and MttB, the MttC protein also is a member of a very large 

10 family of homologous proteins which includes two homologous sequences in E. coli (Ycfh 
and Yjjv) as well as homologous sequences in archaebacteria {Methanobacterium 
thermoautotrophicum), Mycoplasma {Mycoplasma pneumoniae and Mycoplasma 
gentitalitiiim). eubacteria {Bacillus subtillis, Heamophilus influenzae, Helicobacter pylori, 
Mycobacterium tuberculosis), cyanobacteria {Synechocytis PCC6803), yeast 

1 5 {Schizosaccharomyces pombe and Saccharomyces cerevisae), C. elegans and humans (Figure 
10). The human protein is notable in having a 440 amino acid extension at the amino 
terminus which is not found in the other proteins. This extension is not related to MttA or 
MttB. 



20 EXAMPLE 8 

Construction of host cells containing a deletion of at least a portion of the genes mttA^ 

mttB and mttC 



The function of MttA, MttB and MttC proteins in a host cell is demonstrated by in 
25 vivo homologous recombination of chromosomal mttA, mttB and mttC as previously described 
[Sambasivarao et al (1991) J. Bacteriol. 5935-5943; Jasin et al (1984) J. Bacteriol. 159:783- 
786]. Briefly, the mttABC operon is cloned into vectors, and the gene whose function is to 
be determined {i.e., mttA, mttB or mttC) is mutated, e.g., by insertion of a nucleotide 
sequence within the coding region of the gene. The plasmids are then homologously 
30 recombined with chromosomal mttA, mttB or mttC sequences in order to replace the 

chromosomal mttA, mttB or mttC genes with the mutated genes of the vectors. The effect of 
the mutations on the localization of proteins containing twin-arginine amino acid signal 
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sequences is compared between the wild-type host cells and the cells containing the mutated 
mttA, mttB or mttC genes. These steps are further described as follows. 



A. Construction of plasmids carrying deletions or insertions in mttA, mttB and mttC 
5 genes 

The mttABC operon (Figure 11) is cloned into pTZ18R and pBR322 vectors. In 
pBR322, the Hindlll site in mttB is unique. The pBR322 containing mttB is then modified by 
insertion of a kanamycin gene cartridge at this unique site, while the unique Nrul fragment 
contained in mttC is replaced by a kanamycin cartridge. 

10 

B. Homologous recombination and PI transduction 

The modified plasmids are homologously recombined with chromosomal mttA, mttB 
and mttC in E. coli cells which contain either a recBC mutation or a recD mutation. The 
resulting recombinant is transferred by PI transduction to suitable genetic backgrounds for 

15 investigation of the localization of protein expression. The localization (e.g., cytoplasm, 

periplasm, cell membranes, extracellular medium) of expression of twin arginine containing 
proteins is compared using methods disclosed herein (e.g., functional enzyme activity and 
Western blotting) between homologously recombined cells and control cells which had not 
been homologously recombined. Localization of expressed twin arginine containing proteins 

20 extracellularly, in the periplasm, or in the cytoplasm of homologously recombined cells as 
compared to localization of expression in cell membranes of control cells demonstrates that 
the wild-type MttA, MttB or MttC protein whose function had been modified by homologous 
recombination functions in targeting expression of the twin arginine containing protein to the 
cell membrane. Similarly, accumulation of expressed twin arginine containing proteins in 

25 extracellular medium, in the cytoplasm, or in cell membranes of homologously recombined 

cells as compared to periplasmic localization of the expressed twin arginine containing protein 
in control cells which had not been homologously recombined indicates that the protein (i.e., 
MttA, MttB or MttC) whose function had been modified by homologous recombination 
functions in translocation of the twin arginine containing protein to the periplasm. 

30 
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EXAMPLE 9 

Wild-type and mutant twin-arginine amino acid signal sequences of preDmsA are 

cleaved to release mature DmsA 

5 In this Example, the following numbering system for DmsA has been used: the mature 

protein starts at Val 46; the leader extends from Metl to Ala 45 and the double Arg signal is 
at residues 15-21. In order to determine whether preproteins which contain twin-arginine 
amino acid signal sequences are cleaved to release a mature polypeptide as suggested by 
Berks [Berks (1996)], the two alanine amino acids at the -1 and -3 positions of the twin- 
10 arginine amino acid signal sequences of wild-type DmsA preprotein were replaced with 
asparagine, and cleavage of both the wild-type and the mutated twin-arginine amino acid 
signal sequences was investigated. 

A. Cell culture conditions 

15 Cells were grown anaerobically in Luria Broth [Sambrook (1989)] and these cultures 

were used for a 1% inoculum into glycerol minimal medium with 0.167% peptone and 
vitamin Bl, proline, leucine at final concentrations of 0.005%. 

All manipulations of plasmids and strains were carried out as described by Sambrook 
et al. (1989)]. 

20 The upstream untranslated region of DmsA was examined using software from the 

Center for Biological Analysis (http://www.cbs.dtu.dk/) to identify potential leader peptidase I 
cleavage sites. This analysis indicated that mutation of both Ala43 and Ala45 was needed to 
inhibit cleavage. An additional secondary cleavage site with low probability was identified 
between Thr36 and Leu37. The two Ala mutated in this study were Ala43 and Ala45 which 

25 are underlined in the following DmsA leader sequence (SEQ ID NO:43) that contains the 
twin-arginine amino acid signal sequence: 

1 15 30 43 45 

MKTKIPDAVLAAEV SRRGLVK TTIAFFLAMASSALTLPFSRIAHAVDSAI 
Mutants were generated by site-directed mutagenesis of single stranded DNA of plasmid 
30 pDMS223 [Rothery and Weiner (1991) Biochemistry 30:8296-8305] using the Sculptor kit 
(Amersham) and mutagenic primers to generate the mutants A43N and A43N,A45N. The 
mutagenic primer (SEQ ID NO:44) 5 ' -TTAGTCGGATTAATC ACA ATGTCG AT AGCG-3 ? 
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was used. Mutant DNA was subcloned into pDMS160 [Rothery and Weiner (1991)] using 
BgUI and EcoRI restriction sites, and resequenced to confirm the mutation. 



B. Expression studies 

5 Samples were removed from the cultures after 30-48 hours of anaerobic growth, the 

cells pelleted by centrifugation at 9500g for 10 min., resuspended and everted envelopes 
prepared by French Press lysis. The cytoplasm and membrane fractions were separated by 
differential centrifugation. Membranes were washed twice with 50mM MOPS pH7.0 prior to 
use. Membrane proteins were solubilized with 1% SDS and polyacrylamide gel 

10 electrophoresis was performed using the Bio-Rad minigel system with a discontinuous SDS 
buffer system [Laemmli (1970) Nature 227:680-685]. Western blotting was performed using 
affinity purified DmsA antibody with the ECL Western blotting detection reagents from 
Amersham Life Sciences. 

The results (data not shown) demonstrated cleavage of both the preDmsA proteins 

1 5 which contained alanine and which contained asparagine in the twin-arginine amino acid 

signal sequence to release mature DmsA. These results suggest that twin-arginine amino acid 
signal sequences are cleaved by signal peptidase I which also cleaves Sec signal sequences. 
Alternatively, a signal peptidase which is different from signal peptidase I and signal 
peptidase II, and which has different specificity may be operative. This possibility is 

20 investigated by N-terminal amino acid sequencing. 

C. N-terminal amino acid sequencing 

N-terminal amino acid sequencing is carried out as previously described [Bilous et al 
(1988) Molec. Microbiol. 2:785-795] in order to determine the cleavage site in preDmsA and 

25 other preproteins which contain twin-arginine amino acid signal sequences, e.g., preTorA, and 
preNapA. A signal peptidase I temperature sensitive mutant is used to determine if preDmsA, 
preTorA and preNapA are cleaved at the restrictive temperature. Amino terminal sequences 
are determined by automated Edman degradation on an Applied Biosystems Model 470A gas 
phase sequenator. Subunits are separated by SDS PAGE and electroblotted onto 

30 polyvinylidene fluoride membranes and electroeluted as described by Cole et al [J. Bacteriol. 
170:2448-2456 (1988)]. 
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The above-presented data shows that mttAl, MttA2, mttB and mttC encode proteins 
MttAL MttA2, MttB and MttC which are essential in a Sec-independent pathway, and which 
function in targeting twin arginine containing proteins to cell membranes and in translocating 
twin arginine containing proteins to the periplasm and extracellular medium. The above- 
5 disclosed data further demonstrates that disruption of the function of any one or more of 

MttAl, MttA2, MttB and MttC results in translocation of twin arginine containing proteins to 
the periplasm, to extracellular medium, or to cellular compartments other than those 
compartments in which the twin arginine containing proteins are translocated in cells 
containing wild-type MttAl, MttA2, MttB and MttC. These results demonstrate that mttAl, 
10 MttA2, MttB and mttC are useful in translocating twin arginine containing proteins to the 
periplasm and extracellular medium. Such translocation is particularly useful in generating 
soluble proteins in a functional form, thus facilitating purification of such proteins and 
increasing their recovery. 

15 All publications and patents mentioned in the above specification are herein 

incorporated by reference. Various modifications and variations of the described method and 
system of the invention will be apparent to those skilled in the art without departing from the 
scope and spirit of the invention. Although the invention has been described in connection 
with specific preferred embodiments, it should be understood that the invention as claimed 

20 should not be unduly limited to such specific embodiments. Indeed, various modifications of 
the described modes for carrying out the invention which are obvious to those skilled in the 
art and related fields are intended to be within the scope of the following claims. 
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1 . A recombinant polypeptide comprising at least a portion of an amino acid 
sequence selected from the group consisting of SEQ ID NO:47, of SEQ ID NO:49. of SEQ 

5 ID NO: 7 and variants and homologs thereof, and of SEQ ID NO: 8 and variants and homologs 
thereof. 

2. An isolated nucleic acid sequence encoding at least a portion of an amino acid 
sequence selected from the group consisting of SEQ ID NO:47. of SEQ ID NO:49, of SEQ 

10 ID NO:7 and variants and homologs thereof, and of SEQ ID NO:8 and variants and homologs 
thereof. 

3. The nucleic acid sequence of Claim 2, wherein said nucleic acid sequence is 
contained on a recombinant expression vector. 

15 

4. The nucleic acid sequence of Claim 3. wherein said expression vector is 
contained within a host cell. 

5. A nucleic acid sequence that hybridizes under stringent conditions to a nucleic 
20 acid sequence encoding an amino acid sequence selected from the group consisting of SEQ 

ID NO:7 and variants and homologs thereof, and SEQ ID NO:8 and variants and homologs 
thereof. 

6. A method for expressing a nucleotide sequence of interest in a host cell to 

25 produce a soluble polypeptide sequence, said nucleotide sequence of interest when expressed 
in the absence of an operably linked nucleic acid sequence encoding a twin-arginine signal 
amino acid sequence produces an insoluble polypeptide, comprising: 
a) providing: 

i) said nucleotide sequence of interest encoding said insoluble 
30 polypeptide; 

ii) said nucleic acid sequence encoding said twin-arginine signal 
amino acid sequence; and 
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iii) said host cell, wherein said host cell comprises at least a portion 
of an amino acid sequence selected from the group consisting of SEQ ID 
NO:47, of SEQ ID NO:49, of SEQ ID NO:7 and variants and homologs 
thereof, and of SEQ ID NO:8 and variants and homologs thereof; 
5 b) operably linking said nucleotide sequence of interest to said nucleic acid 

sequence to produce a linked polynucleotide sequence; and 

c) introducing said linked polynucleotide sequence into said host cell under 
conditions such that said fused polynucleotide sequence is expressed and said soluble 
polypeptide is produced. 

10 

7. The method of Claim 6, wherein said insoluble polypeptide is comprised in an 
inclusion body. 

8. The method of Claim 6, wherein said insoluble polypeptide comprises a 
15 cofactor. 

9. The method of Claim 8, wherein said cofactor is selected from the group 
consisting of iron-sulfur clusters, molybdopterin, polynuclear copper, tryptophan 
tryptophylquinone, and flavin adenine dinucleotide. 

20 

10. The method of Claim 6, wherein said soluble polypeptide is comprised in 
periplasm of said host cell. 

1 1 . The method of Claim 6, wherein said host cell is cultured in medium, and 
25 wherein said soluble polypeptide is contained in said medium. 

12. The method of Claim 6, wherein said cell is Escherichia coli. 

13. The method of Claim 12, wherein said Escherichia coli cell is D-43. 

30 

14. The method of Claim 6, wherein said twin-arginine signal amino acid sequence 
is selected from the group consisting of SEQ ID NO:41 and SEQ ID NO:42. 
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15. A method for expressing a nucleotide sequence of interest encoding an amino 
acid sequence of interest in a host cell, comprising: 

a) providing: 

i) said host cell; 
5 ii) said nucleotide sequence of interest; 

iii) a first nucleic acid sequence encoding twin-arginine signal amino 
acid sequence; and 

iv) a second nucleic acid sequence encoding at least a portion of an 
amino acid sequence selected from the group consisting of SEQ ID NO:47, of 

10 SEQ ID NO:49, of SEQ ID NO:7 and variants and homologs thereof, and of 

SEQ ID NO:8 and variants and homologs thereof; 

b) operably fusing said nucleotide sequence of interest to said first nucleic 
acid sequence to produce a fused polynucleotide sequence; and 

c) introducing said fused polynucleotide sequence and said second nucleic 

1 5 acid sequence into said host cell under conditions such that said at least portion of said 

amino acid sequence selected from the group consisting of SEQ ID NO:47, of SEQ ID 
NO:49, of SEQ ID NO:7 and variants and homologs thereof, and of SEQ ID NO:8 
and variants and homologs thereof is expressed, and said fused polynucleotide 
sequence is expressed to produce a fused polypeptide sequence comprising said twin- 

20 arginine signal amino acid sequence and said amino acid sequence of interest. 

16. The method of Claim 15, wherein said expressed amino acid sequence of 
interest is contained in periplasm of said host cell. 

25 17. The method of Claim 16, wherein said expressed amino acid sequence of 

interest is soluble. 

18. The method of Claim 15, wherein said host cell is cultured in medium, and 
wherein said expressed amino acid sequence of interest is contained in said medium. 

30 

19. The method of Claim 18, wherein said expressed amino acid sequence of 
interest is soluble. 
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10 20 30 40 50 60 

TTCTGGCTGGGTGCCACCAGATACCAACGTTGAAGAGTTCGAATTTGCCATTCGTACGGT 

70 80 90 100 110 120 

CTGTGAACCTATCTTTGAGAAACCGCTGGCCGAAATTTCGTTTGGACATGTACTGTTAAA 

130 , 140 150 160 170 180 

TCTGTTTAATACGGCGCGTCGCTTCAATATGGAAGTGCAGCCGCAACTGGTGTTACTCCA 

190 200 210 220 230 . 240 

GAAAACCCTGCTCTACGTCGAAGGGGTAGGACGCCAGCTTTATCCGCAACTCGATTTATG 

250 260 270 280 290 300 

GAAAACGGCGAAGCCTTTCCTGGAGTCGTGGATTAAAGATCAGGTCGGTATTCCTGCGCT 

310 320 330 340 350 360 

GGTGAGAGCATTTAAAGAAAAAGCGCCGTTCTGGGTCGAAAAAATGCCAGAACTGCCTGA 

370 380 390 400 410 420 

ATTGGTTTACGACAGTTTGCGCCAGGGCAAGTATTTACAGCACAGTGTTGATAAGATTGC 

430 440 450 460 470 480 

CCGCGAGCTTCAGTCAAATCATGTACGTCAGGGACAATCGCGTTATTTTCTCGGAATTGG 

490 500 510 520 530 540 

CGCTACGTTAGTATTAAGTGGCACATTCTTGTTGGTCAGCCGACCTGAATGGGGGCTGAT 

550 560 570 580 590 600 

GCCCGGCTGGTTAATGGCAGGTGGTCTGATCGCCTGGTTTGTCGGTTGGCGCAAAACACG 

610 620 630 640 650 660 

CTGATTTTTTCATCGCTCAAGGCGGGCCGTGTAACGTATAATGCGGCTTTGTTTAATCAT 

M R L C L I I> 
ORF RF [ 2 ] > 

670 680 690 700 710 720 

CATCTACCACAGAGGAACATGTATGGGTGGTATCAGTATTTGGCAGTTA.TTGATTATTGC 

IYHRGTCM GGIS IWQLLI I A> 
ORF RF[2] > 

730 740 750 760 770 780 
CGTCATCGTTGTACTGCTTTTTGGCACCAAAAAGCTCGGCTCCATCGGTTCCGATCTTGG 
VIVVLLFGTKKLGS I G S D L G> 
ORF RF[2] > 

790 800 810 820 830 840 

TGCGTCGATCAAAGGCTTTAAAAAAGCAATGAGCGATGATGAACCAAAGCAGGATAAAAC 

AS I KGFKKAMSDDEPKQDK T> 
_ORF RF[2j > 

850 860 870 880 890 900 
CAGTCAGGATGCTGATTTTACTGCGAAAACTATCGCCGATAAGCAGGCGGATAGGAATCA 
SQDADFTAKTIADKQADTN Q> 
ORF RF[2] > 

910 920 930 940 950 960 

GGAACAGGCTAAAACAGAAGACGCGAAGCGCCACGATAAAGAGCAGGTGAATCCGTGTTT 

EQAKTEDAKRHDKEQVNPC L> 
ORF RF[2] _ > 
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970 980 990 1000 1010 1020 

GATATCGGTTTTAGCGAACTTGCTATTGGTGTTCATCATCGGCCTCGTCGTTCTGGGGCC 

I SVLANLLLVFI IGLVVLG P> 
ORF RF[2] > 

1030 1040 1050 1060 1070 1080 

GCAACGACTGCCTGTGGCGGTAAAAACGGTAGCGGGCTGGATTCGCGCGTTGCGTTCACT 

QRLPVAVKTVAGWIRALRS L> 
ORF RF[2] _> 

1090 1100 1110 1120 1130 1140 

GGCGACAACGGTGCAGAACGAACTGACCCAGGAGTTAAAACTCCAGGAGTTTCAGGACAG 

ATTVQNELTQELKLQEFQD S> 
ORF RF[2] _> 

1150 1160 1170 1180 1190 1200 

TCTGAAAAAGGTTGAAAAGGCGAGCCTCACTAACCTGACGCCCGAACTGAAAGCGTCGAT 

LKKVEKAS LTNLTPELKAS M> 
ORF RF[2] > 

1210 1220 1230 1240 1250 1260 

GGATGAACTACGCCAGGCCGCGGAGTCGATGAAGCGTTCCTACGTTGCAAACGATCCTGA 

DELRQAAE SMKRSYVANDP E> 
ORF RF[2] __> 

1270 1280 1290 1300 1310 1320 

AAAGGCGAGCGATGAAGCGCACAC CATC CAT AACCCGGTGGTGAAAGATAATGAAGCTGC 

KASDEAHT IHNPVVKDNEA A> 
ORF RF[2] _> 

1330 1340 1350 1360 1370 1380 

GCATGAGGGCGTAACGCCTGCCGCTGCACAAACGCAGGCCAGTTCGCCGGAACAGAAGCC 

H EGVT PAAAQ TQA.S S P EQ K P> 
ORF RF[2] > 

1390 1400 1410 1420 1430 1440 

AGAAACCACGCCAGAGCCGGTGGTAAAACCTGCTGCGGACGCTGAACCGAAAACCGCTGC 

ETTP EPVVKPAADAEPKTA A> 
ORF RF[2] > 

1450 1460 1470 1480 1490 1500 

ACCTTCCCCTTCGTCGAGTGATAAACCGTAAACATGTCTGTAGAAGATACTCAACCGCTT 

MSVEDTQP L> 
ORF RF[1] > 

PSPSSSDKP> 
ORF RF[2] > 

1510 1520 1530 1540 1550 1560 
ATCACGCATCTGATTGAGCTGCGTAAGCGTCTGCTGAACTGCATTATCGCGGTGATCGTG 
I THLIELRKRLLNCI IAVIV> 
ORF RFtl] > 

1570 1580 1590 1600 1610 1620 
ATATTCCTGTGTCTGGTCTATTTCGCCAATGACATCTATCACCTGGTATCCGCGCCATTG 
I FLCL.VYFANDIYHLVSAP L> 
ORF RF[1] > 

1630 1640 1650 1660 1670 1680 
ATCAAGCAGTTGCCGCAAGGTTCAACGATGATCGCCACCGACGTGGCCTCGCCGTTCTTT 
IKQLPQGSTMIATDVASPF F> 
ORF RF[1] . > 
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1690 1700 1710 1720 1730 1740 
ACGCCGATCAAGCTGACCTTTATGGTGTCGCTGATTCTGTCAGCGCCGGTGATTCTCTAT 
TP IKLTFMVSLI LSAPVIL Y> 
ORF RF[1J _ : 

1750 1760 1770 1780 1790 1800 
CAGGTGTGGGCATTTATCGCCCCAGCGCTGTATAAGCATGAACGTCGCCTGGTGGTGCCG 
QVWAFIAPALYKHERRLVV P> 
ORF,RF[l] : 

1810 1820 1830 1840 1850 1860 
CTGCTGGTTTCCAGCTCTCTGCTGTTTTATATCGGCATGGCATTCGCCTACTTTGTGGTC 
LLVSSSLLFYIGMAFAYFV V> 
ORF RF[1] : 

1870 1880 1890 1900 1910 1920 
TTTCCGCTGGCATTTGGCTTCCTTGCCAATACCGCGCCGGAAGGGGTGCAGGTATCCACC 
F P LAFGFLANTAPEGVQVS T> 
ORF RF[1] . ; 

1930 1940 1950 1960 1970 1980 
GACATCGCCAGCTATTTAAGCTTCGTTATGGCGCTGTTTATGGCGTTTGGTGTCTCCTTT 
DI ASYLSFVMALFMAFGVS F> 
ORF RF[1] ; 

1990 2000 2010 2020 2030 2040 
GAAGTGCCGGTAGCAATTGTGCTGCTGTGCTGGATGGGGATTACCTCGCCAGAAGACTTA 
EVPVAIVLLCWMGITSPEDL> 
ORF RFtl] : 

2050 2060 2070 2080 2090 2100 
CGCAAAAAACGCCCGTATGTGCTGGTTGGTGCATTCGTTGTCGGGATGTTGCTGACGCCG 
RKKRPYVLVGAFVVGMLLT P> 
ORF RF[1] : 

2110 2120 2130 2140 2150 2160 
CCGGATGTCTTCTCGCAAACGCTGTTGGCGATCCCGATGTACTGTCTGTTTGAAATCGGT 
PDVFSQTLLAI PMYCLFEI G> 
ORF RF[1J ; 

2170 2180 2190 2200 2210 2220 
GTCTTCTTCTCACGCTTTTACGTTGGTAAAGGGCGAAATCGGGAAGAGGAAAACGACGCT 
VFFSRFYVGKGRNREEEND A> 
ORF RF[1] ; 

2230 2240 2250 2260 2270 2280 

GAAGCAGAAAGCGAAAAAACTGAAGAATAAATTCAACCGCCCGTCAGGGCGGTTGTCATA 

EAESEKTE E> 
ORF RF[1] > 



2290 2300 2310 2320 2330 2340 
TGGAGTACAGGATGTTTGATATCGGCGTTAATTTGACCAGTTCGCAATTTGCGAAAGACC 
MEYRMFDIGVNLTS SQFA K D> 
ORF RF[3] : 

2350 2360 2370 2380 2390 2400 
GTGATGATGTTGTAGCGTGCGCTTTTGACGCGGGAGTTAATGGGCTACTCATCACCGGCA 
RDDVVACAFDAGVNGLL IT G> 
ORF RF [ 3 ] : 



2410 2420 2430 2440 2450 2460 
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CTAACCTGCGTGAAAGCCAGCAGGCGCAAAAGCTGGCGCGTCAGTATTCGTCCTGTTGGT 
TNLRESQQAQKLARQYS SC W> 
ORF RF[3] . > 

2470 2480 2490 2500 2510 2520 

CAACGGCGGGCGTACATCCTCACGACAGCAGCCAGTGGCAAGCTGCGACTGAAGAAGCGA 
STAGVHPHDSSQWQAATEEA> 

ORF RF[3] > 

2530 2540 2550 2560 2570 2580 

TTATTGAGCTGGCCGCGCAGCCAGAAGTGGTGGCGATTGGTGAATGTGGTCTCGACTTTA 
IIELAAQPEVVAIGECGLD F> 

ORF RF[3] . > 

2590 2600 2610 2620 2630 2640 

ACCGCAACTTTTCGACGCCGGAAGAGCAGGAACGCGCTTTTGTTGCCCAGCTACGCATTG 

NRN F S TPEEQERAFVA Q LRI> 

ORF RF[3] > 

2650 2660 2670 2680 2690 2700 

CCGCAGATTTAAACATGCCGGTATTTATGCACTGTCGCGATGCCCACGAGCGGTTTATGA 
AADLNMPVFMHCRDAHERF M> 

ORF RF[3] . > 

2710 2720 2730 2740 2750 2760 

CATTGCTGGAGCCGTGGCTGGATAAACTGCCTGGTGCGGTTCTTCATTGCTTTACCGGCA 
TLLEPWLDKLPGAVLHC FT G> 

ORF RF[3] __ > 



2770 2780 2790 2800 2810 2820 

CACGCGAAGAGATGCAGGCGTGCGTGGCGCATGGAATTTATATCGGCATTACCGGTTGGG 
TREEMQACVAHGIYIGITG W> 

ORF RF[3] . — > 

2830 2840 2850 2860 2870 2880 
TTTGCGATGAACGACGCGGACTGGAGCTGCGGGAACTTTTGCCGTTGATTCCGGCGGAAA 
VCDERRGLEL RELLPLI ?AE> 
, ORF RF[3] . — > 

2890 2900 2910 2920 2930 2940 

AATTACTGATCGAAACTGATGCGCCGTATCTGCTCCCTCGCGATCTCACGCCAAAGCCAT 
KLLIETDAPYLLPRDLTPKP> 

ORF RF[3] > 

2950 2960 2970 2980 2990 3000 

CATCCCGGCGCAACGAGCCAGCCCATCTGCCCCATATTTTGCAACGTATTGCGCACTGGC 
SSRRNEPAHLPHILQRIAHW> 

ORF RF[3] . - > 

3010 3020 3030 3040 3050 3060 

GTGGAGAAGATGCCGCATGGCTGGCTGCCACCACGGATGCTAATGTCAAAACACTGTTTG 
RGEDAAWLAAT TDANVKTLF> 

ORF RF[3] _____ > 

3070 3080 3090 3100 3110 3120 

GGATTGCGTTTTAGAGTTTGCGGAACTCGGTATTCTTCACACTGTGCTTAATCTCTTTAT 

G I A F> 
> 

3130 3140 3150 3160 3170 3180 

TAATAAGATTAAGCAATAGCATGGAGCGAGCCTCACCATCGGGTTCGGTGAAAATGGCCT 
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GAAAGCC^CGAACGCGOTCGGTA^^ 

3250 3260 3270 3280 i9Qn 

gatcgacaatgtctttcggtttatataccgatagctgatgaataaccg^cgatgggIcta 

TCGCTGGCGACGCGCCAAAGCGCACGAAGTGGCTGACAC^ 

tggtatgaI?cacttctgggtcaaattccLaaagagg?Lttgggga^ 
^actgc£g?acgttttcc^ 

tcacagcctgtctttcgatctgttcctgggcacgttgaagttgcccgcg^ 

gtaaataccaggattgcata^tgactcttatccgtttaatcggggcgc 

agctttaSctaagttaItJatattccccggtttgcgSataccgt^ 



ItlL 3680 3690 3700 3710 



3720 
GG 



ATTTAACAAATTTACAGCATCGCAAAGATGAACGCCGTATAATGGGCGCAGATTAAGA 

ctacaatgILggcatgaaata^ 

AGGGTGAGCTAAAACGTATCACGCTCCCGGTGGATCCGCATCTG 
CTGACCGCACTTTGCGTGCCGGTGGGCCTGCGCTGTTGTTCGA^^ 

CAATGCC^TGCTGTGCAACCTGTTC^TACGC 

3970 3980 3990 4000 

AGGAAGATGTTTCGGCGCTGCGTGAAGTTGGTAAATTATTG 
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190 200 210 220 230 240 

AGAAAACCCTGCTCTACGTCGAAGGGGTAGGACGCCAGCTTTATCCGCAACTCGATTTAT 

250 260 270 280 290 300 

GGAAAACGGCGAAGCCTTTCCTGGAGTCGTGGATTAAAGATCAGGTCGGTATTCCTGCGC 

310 320 330 340 350 360 

TGGTGAGAGCATTTAAAGAAAAAGCGCCGTTCTGGGTCGAAAAAATGCCAGAACTGCCTG 

370 380 390 400 410 420 

AATTGGTTTACGACAGTTTGCGCCAGGGCAAGTATTTACAGCACAGTGTTGATAAGATTG 

430 440 450 460 470 480 

CCCGCGAGCTTCAGTCAAATCATGTACGTCAGGGACAATCGCGTTATTTTCTCGGAATTG 

490 500 510 520 530 540 

GCGCTACGTTAGTATTAAGTGGCACATTCTTGTTGGTCAGCCGACCTGAATGGGGGCTGA 

550 560 570 580 590 600 

TGCCCGGCTGGTTAATGGCAGGTGGTCTGATCGCCTGGTTTGTCGGTTGGCGCAAAACAC 

610 620 630 640 650 660 

GCTGATTTTTTCATCGCTCAAGGCGGGCCGTGTAACGTATAATGCGGCTTTGTTTAATCA 

M R L C L I> 
ORF RF[3] > 

670 680 690 700 710 720 
TC ATC T AC C ACAGAGGAAC ATGTATGGGTGGTATC AGTATTTGGC AGTT ATTGATTATTG 
I IYHRGTCMGGISIWQLLI I> 
ORF RF[3] > 

730 740 750 760 770 780 
CCGTCATCGTTGTACTGCTTTTTGGCACCAAAAAGCTCGGCTCCATCGGTTCCGATCTTG 
AVIVVLLFGTKKLGS IGSD L> 
ORF RF[3] > 

790 800 810 820 830 840 
GTGCGTCGATCAAAGGCTTTAAAAAAGCAATGAGCGATGATGAACCAAAGCAGGATAAAA 
GAS IKGFKKAMSDDEPKQD K> 
. ORF RF[3] > 

850 860 870 880 890 900 
CCAGTCAGGATGCTGATTTTACTGCGAAAACTATCGCCGATAAGCAGGCGGATACGAATC 
TSQDADFTAKTIADKQADT N> 
ORF RF [ 3 3 > 

910 920 930 940 950 960 
AGGAACAGGCTAAAACAGAAGACGCGAAGCGCCACGATAAAGAGCAGGTGTAATCCGTGT 
QEQAKTEDAKRHDKEQ V> 
ORF RF[3] > 

V> 



970 980 990 1000 1010 1020 

TTGATATCGGTTTTAGCGAACTGCTATTGGTGTTCATCATCGGCCTCGTCGTTCTGGGGC 
FDIGFSELLLVFIIGLVVL G> 



1030 1040 1050 1060 1070 1080 

CGCAACGACTGCCTGTGGCGGTAAAAACGGTAGCGGGCTGGATTCGCGCGTTGCGTTCAC 
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PCT/CA99/00272 



PQRL PVAVKTVAGWIRALRS> 

> 



1090 1100 1110 1120 1130 1140 

TGGCGACAACGGTGCAGAACGAACTGACCCAGGAGTTAAAACTCCAGGAGTTTCAGGACA 
LATTVQNELTQELKLQEFQD> 



1150 1160 1170 1180 1190 , 1200 

GTCTGAAAAAGGTTGAAAAGGCGAGCCTCACTAACCTGACGCCCGAACTGAAAGCGTCGA 
SLKKVEKASLTNLTPELKA S> 



1210 1220 1230 1240 1250 1260 

TGGATGAACTACGCCAGGCCGCGGAGTCGATGAAGCGTTCCTACGTTGCAAACGATCCTG 
MDELRQAAESMKRSYVANDP> 



1270 1280 1290 1300 1310 1320 

AAAAGGCGAGCGATGAAGCGCACACCATCCATAACCCGGTGGTGAAAGATAATGAAGCTG 
EKASDEAHTIHNPVVKDNE A> 



1330 1340 1350 1360 1370 1380 

CGCATGAGGGCGTAACGCCTGCCGCTGCACAAACGCAGGCCAGTTCGCCGGAACAGAAGC 
AHEGVTPAAAQTQASS PEQ K> 



1390 1400 1410 1420 1430 1440 

CAGAAACCACGCCAGAGCCGGTGGTAAAACCTGCTGCGGACGCTGAACCGAAAACCGCTG 
PET T PE PVVKPAADAE P KT A> 



1450 1460 1470 1480 1490 1500 

CACCTTCCCCTTCGTCGAGTGATAAACCGTAAACATGTCTGTAGAAGATACTCAACCGCT 

MSVEDTQP L> 
ORF RF[2] > 

APSPSSSDKP> 



1510 1520 1530 1540 1550 1560 
TATCACGCATCTGATTGAGCTGCGTAAGCGTCTGCTGAACTGCATTATCGCGGTGATCGT 
ITHLIELRKRLLNCI IAVI V> 
ORF RF[2] > 

1570 1580 1590 1600 1610 1620 
GATATTCCTGTGTCTGGTCTATTTCGCCAATGACATCTATCACCTGGTATCCGCGCCATT 
I FLCLVYFANDIYHLVSA? L> 
ORF RF[2] > 

1630 1640 1650 1660 1670 1680 
GATCAAGCAGTTGCCGCAAGGTTCAACGATGATCGCCACCGACGTGGCCTCGCCGTTCTT 
IKQLPQGSTMIATDVASPFF> 
ORF RF[2] > 

1690 1700 1710 1720 1730 1740 
TACGCCGATCAAGCTGACCTTTATGGTGTCGCTGATTCTGTCAGCGCCGGTGATTCTCTA 
TPIKLTFMVSLILSAPVIL Y> 
ORF RF[2] > 

1750 1760 1770 1780 1790 1800 

TCAGGTGTGGGCATTTATCGCCCCAGCGCTGTATAAGCATGAACGTCGCCTGGTGGTGCC 
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VWAFIAPALYKHERRLVVP> 
ORF RF[2] > 



1810 1820 1830 1840 1850 1860 
GCTGCTGGTTTCCAGCTCTCTGCTGTTTTATATCGGCATGGCATTCGCCTACTTTGTGGT 
LLVS S S LLFYIGMAFAYFV V> 
ORF RF[2] > 

1870 1880 1890 1900 1910 1920 

CTTTCCGCTGGCATTTGGCTTCCTTGCCAATACCGCGCCGGAAGGGGTGCAGGTATCCAC 

p p LAFG FLANTAP EGVQVS T> 
ORF RF[2] _ , > 

1930 1940 1950 1960 1970 1980 
CGACATCGCCAGCTATTTAAGCTTCGTTATGGCGCTGTTTATGGCGTTTGGTGTCTCCTT 
DIASYLSFVMALFMAFGVS F> 
ORF RF[2] „ > 

1990 2000 2010 2020 2030 2040 
TGAAGTGCCGGTAGCAATTGTGCTGCTGTGCTGGATGGGGATTACCTCGCCAGAAGACTT 
EVPVAIVLLCWMGITSPEDL> 
ORF RF[2] , . > 

2050 2060 2070 2080 2090 2100 
ACGCAAAAAACGCCCGTATGTGCTGGTTGGTGCATTCGTTGTCGGGATGTTGCTGACGCC » 
RKKRPYVLVGAFVVGMLLT P> 
ORF RF[2] __ > 

2110 2120 2130 2140 2150 2160 
GCCGGATGTCTTCTCGCAAACGCTGTTGGCGATCCCGATGTACTGTCTGTTTGAAATCGG 
PDVFSQTLLAIPMYCLFEI G> 
ORF RF[2] . — > 

2170 2180 2190 2200 2210 2220 

TGTCTTCTTCTCACGCTTTTACGTTGGTAAAGGGCGAAATCGGGAAGAGGAAAACGACGC 

VF F SRFYVGKGRNREEEND A> 
__ ORF RF[2] ' > 

2230 2240 2250 2260 2270 2280 

TGAAGCAGAAAGCGAAAAAACTGAAGAATAAATTCAACCGCCCGTCAGGGCGGTTGTCAT 

EAESEKTE E> 
ORF RF[2] > 

2290 2300 2310 2320 2330 2340 
ATGGAGTACAGGATGTTTGATATCGGCGTTAATTTGACCAGTTCGCAATTTGCGAAAGAC 
MEYRMFDIGVNLTS SQFAKD> 
_ORF RF[1] . > 

2350 2360 2370 2380 2390 2400 
CGTGATGATGTTGTAGCGTGCGCTTTTGACGCGGGAGTTAATGGGCTACTCATCACCGGC 
RDDVVACAFDAGVNGLLIT G> 
ORF RF[1] i > 

2410 2420 2430 2440 2450 2460 
ACTAACCTGCGTGAAAGCCAGCAGGCGCAAAAGCTGGCGCGTCAGTATTCGTCCTGTTGG 
TNLRESQQAQKLARQY SSC W> 
ORF RF[1] _ — -> 

2470 2430 2490 2500 2510 2520 
TCAACGGCGGGCGTACATCCTCACGACAGCAGCCAGTGGCAAGCTGCGACTGAAGAAGCG 
STAGVH PHDS'SQWQAATEE A> 
ORF RF [1] > 
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2530 2540 2550 2560 2570 2580 
ATTATTGAGCTGGCCGCGCAGCCAGAAGTGGTGGCGATTGGTGAATGTGGTCTCGACTTT 
I I ELAAQPEVVAIGECGLD F> 
ORF RF[1] > 

2590 2600 2610 2620 2630 2640 

AACCGCAACTTTTCGACGCCGGAAGAGCAGGAACGCGCTTTTGTTGCCCAGCTACGCATT 
NRNFSTPEEQ ERAFVAQLRI> 

ORF RF[1] > 

2650 2660 2670 2680 2690 2700 
GCCGCAGATTTAAACATGCCGGTATTTATGCACTGTCGCGATGCCCACGAGCGGTTTATG 
AADLNMPVFMHCRDAHERF M> 
ORF RF[1] _ > 

2710 2720 2730 2740 2750 2760 
ACATTGCTGGAGCCGTGGCTGGATAAACTGCCTGGTGCGGTTCTTCATTGCTTTACCGGC 
TLLEPWLDKL PGAVLH C F T G> 
ORF RF[1] > 

2770 2780 2790 2800 2810 2820 
ACACGCGAAGAGATGCAGGCGTGCGTGGCGCATGGAATTTATATCGGCATTACCGGTTGG 
TREEMQACVAHGIYIGITG W> 
ORF RF[1] > 

2830 2840 2850 2860 2870 2880 
GTTTGCGATGAACGACGCGGACTGGAGCTGCGGGAACTTTTGCCGTTGATTCCGGCGGAA 
VCDERRGLELRELLPLI PA E> 
ORF RF[1] > 

2890 2900 2910 2920 2930 2940 
AAATTACTGATCGAAACTGATGCGCCGTATCTGCTCCCTCGCGATCTCACGCCAAAGCCA 
KLLIETDAPYLLPRDLTPK P> 
ORF RF[1] > 

2950 2960 2970 2980 2990 3000 
TCATCCCGGCGCAACGAGCCAGCCCATCTGCCCCATATTTTGCAACGTATTGCGCACTGG 
SSRRNEPAHLPHILQRIAH W> 
ORF RF[1] > 

3010 3020 3030 3040 3050 3060 
CGTGGAGAAGATGCCGCATGGCTGGCTGCCACCACGGATGCTAATGTCAAAACACTGTTT 
RGEDAAWLAATTDANVKTL F> 
ORF RF[1] > 

3070 3080 3090 3100 3110 3120 

GGGATTGCGTTTTAGAGTTTGCGGAACTCGGTATTCTTCACACTGTGCTTAATCTCTTTA 

G I A F> 
> 

3130 3140 3150 3160 3170 3180 

TTAATAAGATTAAGCAATAGCATGGAGCGAGCCTCACCATCGGGTTCGGTGAAAATGGCC 

3190 3200 3210 3220 3230 3240 

TGAAAGCCTTCGAACGCGCCTTCGGTAATAATCACCTTATCACCCGGATAAGGGGTTGCC 

3250 3260 3270 3280 3290 3300 

GGATCGACAATGTCTTTCGGTTTATATACCGATAGCTGATGAATAACCGCCGATGGGACT 

3310 3320 3330 3340 3350 3360 

ATCGCTGGCGACGCGCCAAAGCGCACGAAGTGGCTGACACCGCGGGTCGCGTTGATAGTC 
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3370 3380 3390 3400 3410 3420 

GTGGTATGAATCACTTCTGGGTCAAATTCCACAAACAGGTAGTTGGGGAACAATGGCTCA 

3430 3440 3450 3460 3470 3480 

CTGACTGCAGTACGTTTTCCACGCACGATTTTTTCCAGGGTGATCATCGGTGCCAGGCAA 

3490 3500 3510 3520 3530 3540 

TTCACAGCCTGTCTTTCGAGGTGTTCCTGGGCACGTTGAAGTTGCCCGCGCTTGCAGTAC 

3550 3560 3570 3580 3590 3600 

AGTAAATACCAGGATTGCATAATGACTCTTATCCGTTTAATCGGGGCGCAAGGATAGCAA 

3610 3620 3630 3640 3650 3660 

AAGCTTTACGCTAAGTTAATTATATTCCCCGGTTTGCGTTATACCGTCAGAGTTCACGCT 

3670 3680 3690 3700 3710 3720 

AATTTAACAAATTTACAGCATCGCAAAGATGAACGCCGTATAATGGGCGCAGATTAAGAG 

3730 3740 3750 3760 3770 3780 

GCTACAATGGACGCCATGAAATATAACGATTTACGCGACTTCTTGACGCTGCTTGAACAG 

3790 3800 3810 3820 3830 3840 

CAGGGTGAGCTAAAACGTATCACGCTCCCGGTGGATCCGCATCTGGAAATCACTGAAATT 

3850 3860 3870 3880 3890 3900 

GCTGACCGCACTTTGCGTGCCGGTGGGCCTGCGCTGTTGTTCGAAAACCCTAAAGGCTAC 

3910 3920 3930 3940 3950 3960 

TCAATGCCGGTGCTGTGCAACCTGTTCGGTACGCCAAAGCGCGTGGCGATGGGCATGGGG 

3970 3980 3990 4000 

CAGGAAGATGTTTCGGCGCTGCGTGAAGTTGGTAAATTATT 
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SEQUENCE LISTING 



<110> Weiner, Joel H. 

Turner, Raymond J. 



<120> Compositions and Methods for Protein Secretion 



<130> UALB-03697 

<140> PCT/CA99/00272 
<141> 1999-03-29 

<150> 09/053,197 
<151> 1998-04-01 

<150> 09/085,761 
<151> 1998-05-28 



<160> 49 



<170> Patentln Ver. 2.0 



<210> 1 
<211> 277 
<212> PRT 

<213> Escherichia coli 



<400> 1 

Met Arg Leu Cys 
1 

Gly He Ser He 
20 

Leu Phe Gly Thr 
35 

Ser He Lys Gly 
50 

Asp Lys Thr Ser 
65 

Lys Gin Ala Asp 



Arg His Asp Lys 
100 

Ala Asn Leu Leu 
115 

Gin Arg Leu Pro 
130 



Leu He He He 
5 

Trp Gin Leu Leu 



Lys Lys Leu Gly 
40 

Phe Lys Lys Ala 
55 

Gin Asp Ala Asp 
70 

Thr Asn Gin Glu 
85 

Glu Gin Gly Val 



Leu Val Phe He 
120 

Val Ala Val Lys 
135 



Tyr His Arg Gly 
10 

He He Ala Val 
25 

Ser He Gly Ser 



Met Ser Asp Asp 
60 

Phe Thr Ala Lys 
75 

Gin Ala Lys Thr 
90 

Asn Pro Cys Leu 
105 

He Gly Leu Val 



Thr Val Ala Gly 
140 



Thr Cys Met Gly 
15 

He Val Val Leu 
30 

Asp Leu Gly Ala 
45 

Glu Pro Lys Gin 



Thr He Ala Asp 
80 

Glu Asp Ala Lys 
95 

He Ser Val Leu 
110 

Val Leu Gly Pro 
125 

Trp He Arg Ala 
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Leu Arg Ser Leu 
145 

Lys Leu Gin Glu 



Leu Thr Asn Leu 
180 

Gin Ala Ala Glu 
195 

Lys Ala Ser Asp 
210 

Asn Glu Ala Ala 
225 

Ala Ser Ser Pro 



Lys Pro Ala Ala 
260 

Ser Ser Asp Lys 
275 



Ala Thr Thr Val 
150 

Phe Gin Asp Ser 
165 

Thr Pro Glu Leu 



Ser Met Lys Arg 
200 

Glu Ala His Thr 
215 

His Glu Gly Val 
230 

Glu Gin Lys Pro 
245 

Asp Ala Glu Pro 



Pro 



Gin Asn Glu Leu 
155 

Leu Lys Lys Val 
170 

Lys Ala Ser Met 
185 

Ser Tyr Val Ala 



lie His Asn Pro 
220 

Thr Pro Ala Ala 
235 

Glu Thr Thr Pro 
250 

Lys Thr Ala Ala 
265 



Thr Gin Glu Leu 
160 

Glu Lys Ala Ser 
175 

Asp Glu Leu Arg 
190 

Asn Asp Pro Glu 
205 

Val Val Lys Asp 



Ala Gin Thr Gin 
240 

Glu Pro Val Val 
255 

Pro Ser Pro Ser 
270 



<210> 2 
<211> 284 
<212> PRT 

<213> Haemophilus influenzae 
<400> 2 

Met Ala Lys Lys Ser lie Phe Arg Ala Lys Phe Phe Leu Phe Tyr Arg 
15 10 15 

Thr Glu Phe lie Met Phe Gly Leu Ser Pro Ala Gin Leu lie lie Leu 
20 25 30 

Leu Val Val lie Leu Leu lie Phe Gly Thr Lys Lys Leu Arg Asn Ala 
35 40 45 

Gly Ser Asp Leu Gly Ala Ala Val Lys Gly Phe Lys Lys Ala Met Lys 
50 55 60 

Glu Asp Glu Lys Val Lys Asp Ala Glu Phe Lys Ser lie Asp Asn Glu 
65 70 75 80 

Thr Ala Ser Ala Lys Lys Gly Lys Tyr Lys Arg Glu Arg Asn Arg Leu 
85 90 95 

Asn Pro Cys Leu lie Leu Val Phe Gin Asn Leu Phe Tyr Xaa Met Val 
100 105 110 

Leu Gly Leu Val Val Leu Gly Pro Lys Arg Leu Pro lie Ala lie Arg 
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115 120 125 

Thr Val Met Asp Trp Val Lys Thr lie Arg Gly Leu Ala Ala Asn Val 
130 135 140 

Gin Asn Glu Leu Lys Gin Glu Leu Lys Leu Gin Glu Leu Gin Asp Ser 
145 150 155 160 

lie Lys Lys Ala Glu Ser Leu Asn Leu Gin Ala Leu Ser Pro Glu Leu 
165 170 175 

Ser Lys Thr Val Glu Glu Leu Lys Ala Gin Ala Asp Lys Met Lys Ala 
180 185 190 

Glu Leu Glu Asp Lys Ala Ala Gin Ala Gly Thr Thr Val Glu Asp Gin 
195 200 205 

lie Lys Glu lie Lys Ser Ala Ala Glu Asn Ala Glu Lys Ser Gin Asn 
210 215 220 

Ala He Ser Val Glu Glu Ala Ala Glu Thr Leu Ser Glu Ala Glu Arg 
225 230 235 240 

Thr Pro Thr Asp Leu Thr Ala Leu Glu Thr His Glu Lys Val Glu Leu 
245 250 255 

Asn Thr His Leu Ser Ser Tyr Tyr Pro Pro Asp Asp He Glu He Ala 
260 265 270 

Pro Ala Ser Lys Ser Gin Ser Ser Lys Thr Lys Ser 
275 280 



<210> 3 
<211> 22108 
<212> DNA 

<213> Escherichia coli 
<400> 3 

agtcctgcag aatgaagggt gatttatgtg atttgcatca cttttggtgg gtaaatttat 60 
gcaacgcatt tgcgtcatgg tgatgagtat cacgaaaaaa tgttaaaccc ttcggtaaag 12 0 
tgtctttttg cttcttctga ctaaaccgat tcacagagga gttgtatatg tccaagtctg 18 0 
atgtttttca tctcggcctc actaaaaacg atttacaagg ggctacgctt gccatcgtcc 240 
ctggcgaccc ggatcgtgtg gaaaagatcg ccgcgctgat ggataagccg gttaagctgg 3 00 
catctcaccg cgaattcact acctggcgtg cagagctgga tggtaaacct gttatcgtct 36 0 
gctctaccgg tatcggcggc ccgtctacct ctattgctgt tgaagagctg gcacagctgg 42 0 
gcattcgcac cttcctgcgt atcggtacaa cgggcgctat tcagccgcat attaatgtgg 48 0 
gtgatgtcct ggttaccacg gcgtctgtcc gtctggatgg cgcgagcctg cacttcgcac 54 0 
cgctggaatt cccggctgtc gctgatttcg aatgtacgac tgcgctggtt gaagctgcga 6 00 
aatccattgg cgcgacaact cacgttggcg tgacagcttc ttctgatacc ttctacccag 66 0 
gtcaggaacg ttacgatact tactctggtc gcgtagttcg tcactttaaa ggttctatgg 720 
aagagtggca ggcgatgggc gtaatgaact atgaaatgga atctgcaacc ctgctgacca 78 0 
tgtgtgcaag tcagggcctg cgtgccggta tggtagcggg tgttatcgtt aaccgcaccc 84 0 
agcaagagat cccgaatgct gagacgatga aacaaaccga aagccatgcg gtgaaaatcg 90 0 
tggtggaagc ggcgcgtcgt ctgctgtaat tctcttctcc tgtctgaagg ccgacgcgtt 96 0 
cggccttttg tatttttgcg tagcgcctcg caggaaatgc ctttccaact ggacgtttgt 1020 



SUBSTITUTE SHEET (RULE 26) 



WO 99/51 753 PCT/C A99/00272 



acagcacaat 
aatcatggtt 
ttatcaacat 
gttaagcgcg 
actcaataac 
tgaagtaacc 
gattaacagc 
gcacagcaat 
gctacgtgaa 
acaagaacgc 
ggcccaggaa 
ctggggcgag 
atatgaaacc 
cgtgcgcctg 
tgaacgctat 
cgcgtcggtg 
gctgcgaact 
gcttgaccgc 
cccgactacg 
aagccgcaac 
gttcatcgat 
gcaggcaatg 
tcgcggttta 
gagccaggat 
tcaacgcgat 
gtagaaatct 
caggcattga 
cgaaggaaca 
atgtcatgaa 
attgcagcgg 
tgacagcgaa 
atgaatccat 
ttgagtatgt 
tcaccatttc 
atcgcgtgct 
agccgctgag 
tggtcgcgaa 
atcaggatac 
atctgacggc 
atgcctttta 
cgctcacccg 
gtaaaaggct 
ggcgaatggg 
cttcgcgatc 
gatattcagg 
gaactgctgg 
ggaggcgcaa 
attactgaag 
acggctgccg 
aaatgacgcc 
gacttgatga 
cattattctg 
ccctgcaaga 
atctttttcc 
ttgatggcaa 
cgtggtttga 



tctattttgt 
tacgcagtta 
gcgcagcaaa 
gcaaaacaac 
gaagtgcgca 
acgcggatgg 
gagcagcgcc 
cgccgggttg 
caactggacg 
cataccctga 
gcgatcaacc 
gtagtattga 
caggtcagca 
ccgcagggaa 
tttaacgccg 
cgtaaccata 
ctggattacg 
cagccggagc 
ctgctggtgg 
gcccagcaaa 
gacatgtccg 
aaaaaactct 
ggagtagaaa 
gaagagtatc 
gatgaatata 
agggcatcga 
gatggtggat 
aaaagcggat 
tgatttgatg 
cgtacgccgt 
attctcccgc 
gcccaaaatg 
tcaggcgaac 
gtttggtctg 
gaaacccggc 
caaagcctat 
cgacgccgac 
cctgaaagcc 
aggggttgtg 
aacctttagt 
cgctgaaaac 
tttcgacgtc 
caggcgatgc 
gccagcagct 
tggtgcaaaa 
ccccttatac 
agttcctgca 
agtggcgtat 
tcgagcgtgc 
aggtgaagta 
actgatcccc 
gatgccaaat 
actggggccg 
accgcatatt 
gctggcgaag 
cgattttgaa 



gcgggtaagt 
ttgcgttggt 
aagccgagca 
aaattaccca 
gcctgcaaag 
aagccgcaca 
tcagtgagca 
atgagcaaaa 
gtttccgccg 
cccacgaaat 
tgacgcgcgc 
cgcgggtgct 
tcgaaaatga 
aagatgtggt 
aagacgacta 
tccgtttgct 
tgctgatgtt 
tgatcaccga 
cgctgcgcac 
tcgccgatcg 
cgattggtca 
cttcagggcg 
ttaaacgcga 
gacttcggtc 
atcagcagtc 
cgcccaatct 
aagtcacaag 
atggtcgccc 
tcatttggta 
gggcagaccg 
ctggtcggag 
ggccgcgaga 
gctgaggcgc 
cgtaacgtca 
ggccgcctgc 
gatgcatact 
agctaccgtt 
atgatgcagg 
gcgctgcatc 
gacggcagga 
ggcccgctcg 
attgattctg 
tgactgcacc 
taccgcactg 
cttcgttgcg 
cggtgatatc 
tcacggcatt 
ggcacccggt 
tgttgatgcc 
cggcgcctat 
aaaatgcgta 
cggcataaag 
gtttggatca 
gccgatcagc 
cagcagattg 
atcaagccgc 



tgttgcgtca 
gggtgtggca 
attagctgaa 
aagcgagcac 
tattaacacc 
gcaacatgct 
gtttgaaaac 
ccgtcagagt 
tcaggttcag 
tcgcaatctc 
gctgaaaggc 
ggaggcttcc 
cgcccgctcg 
gatcgacgcc 
cacccgcgaa 
gggacgcaaa 
tattcccgtt 
agcgttgaaa 
tatcgccaac 
tgccagcaag 
aagtctcgac 
cggaaatgtg 
gattaatccg 
ggttccggag 
gcgctagccc 
gttacacttc 
aaacgacgca 
acgttttcca 
ttcatcgttt 
tgctggatct 
aaactggcaa 
agctgcgtaa 
tgccgttccc 
ccgacaaaga 
tggtgcttga 
ccttccatgt 
atctggcaga 
atgccggatt 
gtggttataa 
attgaaagtc 
cgtctgctgg 
gtgttcagcg 
gttatcgcct 
attcgcagtg 
ctggcagatc 
gccgctgaag 
aagcgccagc 
ccgcttgaag 
ctgaccaaac 
atttcatcat 
tcaccctgcc 
acaaactttt 
agttcgggca 
tggcgttatt 
aagctgcaat 
tggcttctgc 



ggaggcgttg 
attggctggc 
cgtgaagaga 
tggcgtgcag 
tctctggagg 
gacgataaaa 
ctcgccaacc 
ctgaacagcc 
gacagcttcg 
cagcaactca 
gacaataaaa 
ggtctgcgtg 
cggatgcagc 
aaaatgacgc 
agcgcgctac 
gattatcaac 
gaacccgctt 
aacaacatca 
ctgtggcgtt 
ctgtacgaca 
aaagcgcagg 
ctggcgcagg 
gatttggctg 
cagccgaatg 
attgggagta 
tggaacaatt 
ctttggtttt 
ttccgtggca 
gtggaagcga 
ggctggtggc 
agtggtcctt 
tatcggtgtg 
ggataacacc 
taaagcactg 
gttctcgaag 
gctgccgcgt 
atccatccgt 
cgaaagtgtc 
gttctgacag 
tgctcaacac 
gtaaagtatt 
aacgccaggt 
acgccagtgt 
gtgagctgga 
tggcagagtt 
gaatcagcaa 
aacgttatgt 
tggcctggtt 
ggctggaaaa 
tcgcactttt 
gctacggcta 
aggtgagcga 
aatgttatca 
gcaggacaaa 

gggcggcttg 

ttctatcgcc 



tggatttctc 
tgtttgccag 
tggtcgcgga 
agtgcgagtt 
ccgatctgcg 
ttcgccagat 
gtatttttga 
tgttgtcgcc 
gtaaagaagc 
acgcgcaaat 
cccagggcaa 
aagggtatga 
cggatgtcat 
tggtcgccta 
aggaacatat 
agctgccggg 
ttttactggc 
tgctggttag 
atgagcatca 
agatgcgttt 
ataattatcg 
cagaagcgtt 
aacaggcggt 
atgaagctta 
gttaagccgg 
ttttgatgag 
cagaccgtcg 
tcaaaatacg 
ttcacgattg 
accggcgacc 
gctgatatca 
attggcaacg 
tttgattgca 
cgttcaatgt 
ccaattatcg 
attggctcac 
atgcatcccg 
gactactaca 
gagaccggaa 
cttcctgtat 
gcgcgtggag 
tgatgtactg 
gttgccgaaa 
agtgcagggc 
cgaccctgcg 
agccatgcgc 
ggcggaagcc 
tgcggaagag 
actggaggct 
ttaagctacg 
tggcgatact 
ctacgactgg 
acccgccgcg 
gttgctccgt 
ccggtagaag 
caggttcata 



1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
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ccgcgcgatt gaaatcgaat ggtaaagagg 
tgccggttat taaagcggat ctgaaactta 
tgctgccgga tggtcgccgt ctgcgcccaa 
tgattgatga actgaatttg ctgcgggaat 
ttgaagacag cccgatgctc tacatcccgg 
tgatggtgat ggagcgcatt tacggcattc 
acggcactaa catgaaattg ctggcggaac 
ttcgcgacag ctttttccat gccgatatgc 
acccggaaaa cccgaaatat atcggcattg 
aagataaacg ctatctggca gaaaacttta 
tggcagagct acacgtcgat tctggctggg 
aatttgccat tcgtacggtc tgtgaaccta 
ttggacatgt actgttaaat ctgtttaata 
cgcaactggt gttactccag aaaaccctgc 
atccgcaact cgatttatgg aaaacggcga 
aggtcggtat tcctgcgctg gtgagagcat 
aaatgccaga actgcctgaa ttggtttacg 
acagtgttga taagattgcc cgcgagcttc 
gttattttct cggaattggc gctacgttag 
gacctgaatg ggggctgatg cccggctggt 
tcggttggcg caaaacacgc tgattttttc 
tgcggctttg tttaatcatc atctaccaca 
ggcagttatt gattattgcc gtcatcgttg 
ccatcggttc cgatcttggt gcgtcgatca 
aaccaaagca ggataaaacc agtcaggatg 
agcaggcgga tacgaatcag gaacaggcta 
agcaggtgaa tccgtgtttg atatcggttt 
gcctcgtcgt tctggggccg caacgactgc 
ttcgcgcgtt gcgttcactg gcgacaacgg 
tccaggagtt tcaggacagt ctgaaaaagg 
ccgaactgaa agcgtcgatg gatgaactac 
acgttgcaaa cgatcctgaa aaggcgagcg 
tgaaagataa tgaagctgcg catgagggcg 
gttcgccgga acagaagcca gaaaccacgc 
ctgaaccgaa aaccgctgca ccttcccctt 
agaagatact caaccgctta tcacgcatct 
cattatcgcg gtgatcgtga tattcctgtg 
cctggtatcc gcgccattga tcaagcagtt 
cgtggcctcg ccgttcttta cgccgatcaa 
agcgccggtg attctctatc aggtgtgggc 
acgtcgcctg gtggtgccgc tgctggtttc 
attcgcctac tttgtggtct ttccgctggc 
aggggtgcag gtatccaccg acatcgccag 
ggcgtttggt gtctcctttg aagtgccggt 
tacctcgcca gaagacttac gcaaaaaacg 
cgggatgttg ctgacgccgc cggatgtctt 
ctgtctgttt gaaatcggtg tcttcttctc 
ggaagaggaa aacgacgctg aagcagaaag 
cgtcagggcg gttgtcatat ggagtacagg 
tcgcaatttg cgaaagaccg tgatgatgtt 
gggctactca tcaccggcac taacctgcgt 
cagtattcgt cctgttggtc aacggcgggc 
gctgcgactg aagaagcgat tattgagctg 
gaatgtggtc tcgactttaa ccgcaacttt 
gttgcccagc tacgcattgc cgcagattta 
gcccacgagc ggtttatgac attgctggag 



tggtgattaa agtcatccgc ccggatattt 4440 
tctaccgtct ggctcgctgg gtgccgcgtt 4500 
ccgaagtggt gcgcgagtac gaaaagacat 4560 
ctgccaacgc cattcagctt cggcgcaatt 4620 
aagtttaccc tgactattgt agtgaaggga 4680 
cggtgtctga tgttgcggcg ctggagaaaa 4740 
gcggcgtgca ggtgttcttc actcaggtct 4800 
accctggcaa catcttcgta agctatgaac 4860 
attgcgggat tgttggctcg ctaaacaaag 4920 
tcgccttctt taatcgcgac tatcgcaaag 4980 
tgccaccaga taccaacgtt gaagagttcg 5040 
tctttgagaa accgctggcc gaaatttcgt 5100 
cggcgcgtcg cttcaatatg gaagtgcagc 5160 
tctacgtcga aggggtagga cgccagcttt 5220 
agcctttcct ggagtcgtgg attaaagatc 5280 
ttaaagaaaa agcgccgttc tgggtcgaaa 5340 
acagtttgcg ccagggcaag tatttacagc 5400 
agtcaaatca tgtacgtcag ggacaatcgc 5460 
tattaagtgg cacattcttg ttggtcagcc 5520 
taatggcagg tggtctgatc gcctggtttg 5580 
atcgctcaag gcgggccgtg taacgtataa 564 0 
gaggaacatg tatgggtggt atcagtattt 5700 
tactgctttt tggcaccaaa aagctcggct 5760 
aaggctttaa aaaagcaatg agcgatgatg 5820 
ctgattttac tgcgaaaact atcgccgata 5880 
aaacagaaga cgcgaagcgc cacgataaag 5 94 0 
tagcgaactt gctattggtg ttcatcatcg 6000 
ctgtggcggt aaaaacggta gcgggctgga 6060 
tgcagaacga actgacccag gagttaaaac 612 0 
ttgaaaaggc gagcctcact aacctgacgc 618 0 
gccaggccgc ggagtcgatg aagcgttcct 624 0 
atgaagcgca caccatccat aacccggtgg 6300 
taacgcctgc cgctgcacaa acgcaggcca 636 0 
cagagccggt ggtaaaacct gctgcggacg 642 0 
cgtcgagtga taaaccgtaa acatgtctgt 648 0 
gattgagctg cgtaagcgtc tgctgaactg 6540 
tctggtctat ttcgccaatg acatctatca 6600 
gccgcaaggt tcaacgatga tcgccaccga 6660 
gctgaccttt atggtgtcgc tgattctgtc 6720 
atttatcgcc ccagcgctgt ataagcatga 678 0 
cagctctctg ctgttttata tcggcatggc 6 84 0 
atttggcttc cttgccaata ccgcgccgga 6 900 
ctatttaagc ttcgttatgg cgctgtttat 6960 
agcaattgtg ctgctgtgct ggatggggat 7 02 0 
cccgtatgtg ctggttggtg cattcgttgt 708 0 
ctcgcaaacg ctgttggcga tcccgatgta 7140 
acgcttttac gttggtaaag ggcgaaatcg 72 00 
cgaaaaaact gaagaataaa ttcaaccgcc 726 0 
atgtttgata tcggcgttaa tttgaccagt 7320 
gtagcgtgcg cttttgacgc gggagttaat 73 80 
gaaagccagc aggcgcaaaa gctggcgcgt 744 0 
gtacatcctc acgacagcag ccagtggcaa 7500 
gccgcgcagc cagaagtggt ggcgattggt 7560 
tcgacgccgg aagagcagga acgcgctttt 7620 
aacatgccgg tatttatgca ctgtcgcgat 76 80 
ccgtggctgg ataaactgcc tggtgcggtt 7740 
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cttcattgct ttaccggcac acgcgaagag 
atcggcatta ccggttgggt ttgcgatgaa 
ccgttgattc cggcggaaaa attactgatc 
gatctcacgc caaagccatc atcccggcgc 
caacgtattg cgcactggcg tggagaagat 
aatgtcaaaa cactgtttgg gattgcgttt 
ctgtgcttaa tctctttatt aataagatta 
ggttcggtga aaatggcctg aaagccttcg 
cccggataag gggttgccgg atcgacaatg 
ataaccgccg atgggactat cgctggcgac 
cgggtcgcgt tgatagtcgt ggtatgaatc 
ttggggaaca atggctcact gactgcagta 
atcatcggtg ccaggcaatt cacagcctgt 
tgcccgcgct tgcagtacag taaataccag 
ggggcgcaag gatagcaaaa gctttacgct 
accgtcagag ttcacgctaa tttaacaaat 
atgggcgcag attaagaggc tacaatggac 
ttgacgctgc ttgaacagca gggtgagcta 
ctggaaatca ctgaaattgc tgaccgcact 
gaaaacccta aaggctactc aatgccggtg 
gtggcgatgg gcatggggca ggaagatgtt 
gcgtttctga aagagccgga gccgccaaaa 
cagtttaagc aagtattgaa catgccgaca 
aaaatcgtct ctggcgatga cgtcgatctc 
gaagatgccg cgccgctgat tacctggggg 
cggcagaatc tgggcattta tcgccagcag 
tggctgtcgc atcgcggcgg cgcgctggat 
gaacgtttcc cggtttctgt ggcgctgggt 
actcccgttc cggatacgct ttcagagtat 
accgaagtgg tgaagtgtat ctccaatgat 
ctggaagggt atatcgaaca aggcgaaact 
ggttactata atgaagtcga tagtttcccg 
gaagatgcga tttaccattc cacctatacc 
ggtgtcgcac tgaacgaagt gtttgtgccg 
gatttttacc tgccgccgga aggctgctct 
cagtacgccg gacacgcgaa gcgcgtcatg 
atgtacacta aatttgtgat cgtttgcgat 
gtgatttggg cgattaccac ccgtatggac 
acgcctattg attatctgga ttttgcctcg 
ctggatgcca cgaataaatg gccgggggaa 
aaagatccag atgttgtcgc gcatattgac 
aacggtaaaa gcgcctgatg cgcgtttgtt 
cgcatgacaa ccttaagctg taaagtgacc 
cgtgtccgca tcgtgccaga cgcggccttt 
gtgatggatg agcgcgacaa acgtccgttc 
tttatcgagc tgcatattgg cgcttctgaa 
cgcatcctca aagatcatca aatcgtggtc 
gatgatgaag agcgtccgat gattttgatt 
tcgattttgc tgacagcgtt ggcgcgtaac 
gggcgtgaag agcagcatct gtatgatctc 
cctggtctgc aagtggtgcc ggtggttgaa 
ggcaccgtgt taacggcggt attgcaggat 
attgccggac gttttgagat ggcgaaaatt 
gcgcgggaag atcgcctgtt tggcgatgcg 
gcccctgaca ggcgggaaga acggcaacta 
acgtatctgg caaaagtcct gcagaatgaa 



atgcaggcgt gcgtggcgca tggaatttat 7800 
cgacgcggac tggagctgcg ggaacttttg 7860 
gaaactgatg cgccgtatct gctccctcgc 7920 
aacgagccag cccatctgcc ccatattttg 7980 
gccgcatggc tggctgccac cacggatgct 8 040 
tagagtttgc ggaactcggt attcttcaca 8100 
agcaatagca tggagcgagc ctcaccatcg 8160 
aacgcgcctt cggtaataat caccttatca 8220 
tctttcggtt tatataccga tagctgatga 8280 
gcgccaaagc gcacgaagtg gctgacaccg 8340 
acttctgggt caaattccac aaacaggtag 8400 
cgttttccac gcacgatttt ttccagggtg 8460 
ctttcgaggt gttcctgggc acgttgaagt 8520 
gattgcataa tgactcttat ccgtttaatc 8580 
aagttaatta tattccccgg tttgcgttat 8640 
ttacagcatc gcaaagatga acgccgtata 8700 
gccatgaaat ataacgattt acgcgacttc 8760 
aaacgtatca cgctcccggt ggatccgcat 8 82 0 
ttgcgtzgccg gtgggcctgc gctgttgttc 8880 
ctgtgcaacc tgttcggtac gccaaagcgc 8940 
tcggcgctgc gtgaagttgg taaattattg 9000 
ggtttccgcg acctgtttga taaactgccg 9060 
aagcggctgc gtggtgcgcc ctgccaacaa 9120 
aatcgcattc ccattatgac ctgctggccg 9180 
ctgacagtga cgcgcggccc acataaagag 9240 
ctgattggta aaaacaaact gattatgcgc 93 00 
tatcaggagt ggtgtgcggc gcatccgggc 9360 
gccgatcccg ccacgattct cggtgcagtc 9420 
gcgtttgccg gattgctacg tggcaccaag 9480 
cttgaagtgc ccgccagtgc ggagattgtg 9540 
gcgccggaag ggccgtatgg cgaccacacc 96 00 
gtatttaccg tgacgcatat tacccagcgt 9660 
gggcgtccgc cagatgagcc cgcggtgctg 9720 
attctgcaaa aacagttccc ggaaattgtc 9780 
tatcgcctgg cggtagtgac aatcaaaaaa 984 0 
atgggcgtct ggtcgttctt acgccagttt 9900 
gatgacgtta acgcacgcga ctggaacgat 9960 
ccggcgcggg atactgttct ggtagaaaat 10 02 0 
cctgtctccg ggctgggttc aaaaatgggg 10080 
acccagcgtg aatggggacg tcccatcaaa 1014 0 
gccatctggg atgaactggc tatttttaac 10200 
ttgccctatt tatcgatccg acagagaaag 1026 0 
tcggtagaag ctatcacgga taccgtatat 10320 
tcttttcgtg ctggtcagta tttgatggta 10380 
tcaatggctt cgacgccgga tgaaaaaggg 1044 0 
atcaaccttt acgcgaaagc agtcatggac 10500 
gacattcccc acggagaagc gtggctgcgc 10560 
gcgggcggca ccgggttctc ttatgcccgc 10620 
ccaaaccgtg atatcaccat ttactggggc 10680 
tgcgagcttg aggcgctttc gttgaagcat 10740 
caaccggaag cgggctggcg tgggcgtact 108 00 
cacggtacgc tggcagagca tgatatctat 10860 
gcccgcgatc tgttttgcag tgagcgtaat 10920 
tttgcattta tctgagatat aaaaaaaccc 10980 
aactgttatt cagtggcatt tagatctatg 11040 
gggtgattta tgtgatttgc atcacttttg 11100 
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gtgggtaaat ttatgcaacg catttgcgtc atggtgatga gtatcacgaa aaaatgttaa 11160 
acccttcggt aaagtgtctt tttgcttctt ctgactaaac cgattcacag aggagttgta 11220 
tatgtccaag tctgatgttt ttcatctcgg cctcactaaa aacgatttac aaggggctac 11280 
gcttgccatc gtccctggcg acccggatcg tgtggaaaag atcgccgcgc tgatggataa 11340 
gccggttaag ctggcatctc accgcgaatt cactacctgg cgtgcagagc tggatggtaa 114 00 
acctgttatc gtctgctcta ccggtatcgg cggcccgtct acctctattg ctgttgaaga 11460 
gctggcacag ctgggcattc gcaccttcct gcgtatcggt acaacgggcg ctattcagcc 11520 
gcatattaat gtgggtgatg tcctggttac cacggcgtct gtccgtctgg atggcgcgag 11580 
cctgcacttc gcaccgctgg aattcccggc tgtcgctgat ttcgaatgta cgactgcgct 1164 0 
ggttgaagct gcgaaatcca ttggcgcgac aactcacgtt ggcgtgacag cttcttctga 11700 
taccttctac ccaggtcagg aacgttacga tacttactct ggtcgcgtag ttcgtcactt 11760 
taaaggttct atggaagagt ggcaggcgat gggcgtaatg aactatgaaa tggaatctgc 1182 0 
aaccctgctg accatgtgtg caagtcaggg cctgcgtgcc ggtatggtag cgggtgttat 11880 
cgttaaccgc acccagcaag agatcccgaa tgctgagacg atgaaacaaa ccgaaagcca 11940 
tgcggtgaaa atcgtggtgg aagcggcgcg tcgtctgctg taattctctt ctcctgtctg 12 000 
aaggccgacg cgttcggcct tttgtatttt tgcgtagcgc ctcgcaggaa atgcctttcc 12060 
aactggacgt ttgtacagca caattctatt ttgtgcgggt aagttgttgc gtcaggaggc 12120 
gttgtggatt tctcaatcat ggtttacgca gttattgcgt tggtgggtgt ggcaattggc 1218 0 
tggctgtttg ccagttatca acatgcgcag caaaaagccg agcaattagc tgaacgtgaa 1224 0 
gagatggtcg cggagttaag cgcggcaaaa caacaaatta cccaaagcga gcactggcgt 12300 
gcagagtgcg agttactcaa taacgaagtg cgcagcctgc aaagtattaa cacctctctg 12360 
gaggccgatc tgcgtgaagt aaccacgcgg atggaagccg cacagcaaca tgctgacgat 1242 0 
aaaattcgcc agatgattaa cagcgagcag cgcctcagtg agcagtttga aaacctcgcc 12480 
aaccgtattt ttgagcacag caatcgccgg gttgatgagc aaaaccgtca gagtctgaac 12540 
agcctgttgt cgccgctacg tgaacaactg gacggtttcc gccgtcaggt tcaggacagc 12600 
ttcggtaaag aagcacaaga acgccatacc ctgacccacg aaattcgcaa tctccagcaa 12660 
ctcaacgcgc aaatggccca ggaagcgatc aacctgacgc gcgcgctgaa aggcgacaat 12720 
aaaacccagg gcaactgggg cgaggtagta ttgacgcggg tgctggaggc ttccggtctg 12780 
cgtgaagggt atgaatatga aacccaggtc agcatcgaaa atgacgcccg ctcgcggatg 12840 
cagccggatg tcatcgtgcg cctgccgcag ggaaaagatg tggtgatcga cgccaaaatg 12900 
acgctggtcg cctatgaacg ctattttaac gccgaagacg actacacccg cgaaagcgcg 12960 
ctacaggaac atatcgcgtc ggtgcgtaac catatccgtt tgctgggacg caaagattat 13 020 
caacagctgc cggggctgcg aactctggat tacgtgctga tgtttattcc cgttgaaccc 13 0 80 
gcttttttac tggcgcttga ccgccagccg gagctgatca ccgaagcgtt gaaaaacaac 13140 
atcatgctgg ttagcccgac tacgctgctg gtggcgctgc gcactatcgc caacctgtgg 132 00 
cgttatgagc atcaaagccg caacgcccag caaatcgccg atcgtgccag caagctgtac 132 60 
gacaagatgc gtttgttcat cgatgacatg tccgcgattg gtcaaagtct cgacaaagcg 13320 
caggataatt atcggcaggc aatgaaaaaa ctctcttcag ggcgcggaaa tgtgctggcg 133 80 
caggcagaag cgtttcgcgg tttaggagta gaaattaaac gcgagattaa tccggatttg 13440 
gctgaacagg cggtgagcca ggatgaagag tatcgacttc ggtcggttcc ggagcagccg 13 5 00 
aatgatgaag cttatcaacg cgatgatgaa tataatcagc agtcgcgcta gcccattggg 13 5 60 
agtagttaag ccgggtagaa atctagggca tcgacgccca atctgttaca cttctggaac 13620 
aattttttga tgagcaggca ttgagatggt ggataagtca caagaaacga cgcactttgg 136 80 
ttttcagacc gtcgcgaagg aacaaaaagc ggatatggtc gcccacgttt tccattccgt 13740 
ggcatcaaaa tacgatgtca tgaatgattt gatgtcattt ggtattcatc gtttgtggaa 13800 
gcgattcacg attgattgca gcggcgtacg ccgtgggcag accgtgctgg atctggctgg 13860 
tggcaccggc gacctgacag cgaaattctc ccgcctggtc ggagaaactg gcaaagtggt 13 920 
ccttgctgat atcaatgaat ccatgcccaa aatgggccgc gagaagctgc gtaatatcgg 13 980 
tgtgattggc aacgttgagt atgttcaggc gaacgctgag gcgctgccgt tcccggataa 14 04 0 
cacctttgat tgcatcacca tttcgtttgg tctgcgtaac gtcaccgaca aagataaagc 14100 
actgcgttca atgtatcgcg tgctgaaacc cggcggccgc ctgctggtgc ttgagttctc 1416 0 
gaagccaatt atcgagccgc tgagcaaagc ctatgatgca tactccttcc atgtgctgcc 14220 
gcgtattggc tcactggtcg cgaacgacgc cgacagctac cgttatctgg cagaatccat 14 28 0 
ccgtatgcat cccgatcagg ataccctgaa agccatgatg caggatgccg gattcgaaag 14 34 0 
tgtcgactac tacaatctga cggcaggggt tgtggcgctg catcgtggtt ataagttctg 144 0 0 
acaggagacc ggaaatgcct tttaaacctt tagtgacggc aggaattgaa agtctgctca 14460 
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acaccttcct 
tattgcgcgt 
aggttgatgt 
gtgtgttgcc 
tggaagtgca 
agttcgaccc 
gcaaagccat 
atgtggcgga 
ggtttgcgga 
aaaaactgga 
ttttttaagc 
gctatggcga 
gcgactacga 
atcaacccgc 
caaagttgct 
cttgccggta 
cgcccaggtt 
ccgcccggat 
ctgggtgccg 
gtacgaaaag 
gcttcggcgc 
ttgtagtgaa 
ggcgctggag 
cttcactcag 
cgtaagctat 
ctcgctaaac 
cgactatcgc 
cgttgaagag 
ggccgaaatt 
tatggaagtg 
aggacgccag 
gtggattaaa 
gttctgggtc 
caagtattta 
tcagggacaa 
cttgttggtc 
gatcgcctgg 
cgtgtaacgt 
tggtatcagt 
caaaaagctc 
aatgagcgat 
aactatcgcc 
gcgccacgat 
ggtgttcatc 
ggtagcgggc 
ccaggagtta 
cactaacctg 
gatgaagcgt 
ccataacccg 
acaaacgcag 
acctgctgcg 
gtaaacatgt 
cgtctgctga 
aatgacatct 
atgatcgcca 
tcgctgattc 



gtatcgctca 
ggaggtaaaa 
actgggcgaa 
gaaacttcgc 
gggcgatatt 
tgcggaactg 
gcgcggaggc 
agccattact 
agagacggct 
ggctaaatga 
tacggacttg 
tactcattat 
ctggccctgc 
cgcgatcttt 
ccgtttgatg 
gaagcgtggt 
cataccgcgc 
attttgccgg 
cgtttgctgc 
acattgattg 
aattttgaag 
gggatgatgg 
aaaaacggca 
gtctttcgcg 
gaacacccgg 
aaagaagata 
aaagtggcag 
ttcgaatttg 
tcgtttggac 
cagccgcaac 
ctttatccgc 
gatcaggtcg 
gaaaaaatgc 
cagcacagtg 
tcgcgttatt 
agccgacctg 
tttgtcggtt 
ataatgcggc 
atttggcagt 
ggctccatcg 
gatgaaccaa 
gataagcagg 
aaagagcagg 
atcggcctcg 
tggattcgcg 
aaactccagg 
acgcccgaac 
tcctacgttg 
gtggtgaaag 
gccagfctcgc 
gacgctgaac 
ctgtagaaga 
actgcattat 
atcacctggt 
ccgacgtggc 
tgtcagcgcc 



cccgcgctga 
ggcttttcga 
tgggcaggcg 
gatcgccagc 
caggtggtgc 
ctggcccctt 
gcaaagttcc 
gaagagtggc 
gccgtcgagc 
cgccaggtga 
atgaactgat 
tctggatgcc 
aagaactggg 
ttccaccgca 
gcaagctggc 
ttgacgattt 
gattgaaatc 
ttattaaagc 
cggatggtcg 
atgaactgaa 
acagcccgat 
tgatggagcg 
ctaacatgaa 
acagcttttt 
aaaacccgaa 
aacgctatct 
agctacacgt 
ccattcgtac 
atgtactgtt 
tggtgttact 
aactcgattt 
gtattcctgc 
cagaactgcc 
ttgataagat 
ttctcggaat 
aatgggggct 
ggcgcaaaac 
tttgtttaat 
tattgattat 
gttccgatct 
agcaggataa 
cggatacgaa 
tgaatccgtg 
tcgttctggg 
cgttgcgttc 
agtttcagga 
tgaaagcgtc 
caaacgatcc 
ataatgaagc 
cggaacagaa 
cgaaaaccgc 
tactcaaccg 
cgcggtgatc 
atccgcgcca 
ctcgccgttc 
ggtgattctc 



aaacggcccg 
cgtcattgat 
atgctgactg 
agcttaccgc 
aaaacttcgt 
ataccggtga 
tgcatcacgg 
gtatggcacc 
gtgctgttga 
agtacggcgc 
ccccaaaatg 
aaatcggcat 
gccggtttgg 
tattgccgat 
gaagcagcag 
tgaaatcaag 
gaatggtaaa 
ggatctgaaa 
ccgtctgcgc 
tttgctgcgg 
gctctacatc 
catttacggc 
attgctggcg 
ccatgccgat 
atatatcggc 
ggcagaaaac 
cgattctggc 
ggtctgtgaa 
aaatctgttt 
ccagaaaacc 
atggaaaacg 
gctggtgaga 
tgaattggtt 
tgcccgcgag 
tggcgctacg 
gatgcccggc 
acgctgattt 
catcatctac 
tgccgtcatc 
tggtgcgtcg 
aaccagtcag 
tcaggaacag 
tttgatatcg 
gccgcaacga 
actggcgaca 
cagtctgaaa 
gatggatgaa 
tgaaaaggcg 
tgcgcatgag 
gccagaaacc 
tgcaccttcc 
cttatcacgc 
gtgatattcc 
ttgatcaagc 
tttacgccga 
tatcaggtgt 



ctcgcgtctg 
tctggtgttc 
caccgttatc 
actgattcgc 
tgcgctggca 
tatcgccgct 
cattaagcgc 
cggtccgctt 
tgccctgacc 
ctatatttca 
cgtatcaccc 
aaagacaaac 
atcaagttcg 
cagctggcgt 
attgaagctg 
ccgctggctt 
gaggtggtga 
cttatctacc 
ccaaccgaag 
gaatctgcca 
ccggaagttt 
attccggtgt 
gaacgcggcg 
atgcaccctg 
attgattgcg 
tttatcgcct 
tgggtgccac 
cctatctttg 
aatacggcgc 
ctgctctacg 
gcgaagcctt 
gcatttaaag 
tacgacagtt 
cttcagtcaa 
ttagtattaa 
tggttaatgg 
tttcatcgct 
cacagaggaa 
gttgtactgc 
atcaaaggct 
gatgctgatt 
gctaaaacag 
gttttagcga 
ctgcctgtgg 
acggtgcaga 
aaggttgaaa 
ctacgccagg 
agcgatgaag 
ggcgtaacgc 
acgccagagc 
ccttcgtcga 
atctgattga 
tgtgtctggt 
agttgccgca 
tcaagctgac 
gggcatttat 



ctgggtaaag 
agcgaacgcc 
gcctacgcca 
agtggtgagc 
gatctggcag 
gaaggaatca 
cagcaacgtt 
gaagtggcct 
aaacggctgg 
tcattcgcac 
tgccgctacg 
ttttaggtga 
ggcaaatgtt 
tattgcagga 
caatgggcgg 
ctgcttctat 
ttaaagtcat 
gtctggctcg 
tggtgcgcga 
acgccattca 
accctgacta 
ctgatgttgc 
tgcaggtgtt 
gcaacatctt 
ggattgttgg 
tctttaatcg 
cagataccaa 
agaaaccgct 
gtcgcttcaa 
tcgaaggggt 
tcctggagtc 
aaaaagcgcc 
tgcgccaggg 
atcatgtacg 
gtggcacatt 
caggtggtct 
caaggcgggc 
catgtatggg 
tttttggcac 
ttaaaaaagc 
ttactgcgaa 
aagacgcgaa 
acttgctatt 
cggtaaaaac 
acgaactgac 
aggcgagcct 
ccgcggagtc 
cgcacaccat 
ctgccgctgc 
cggtggtaaa 
gtgataaacc 
gctgcgtaag 
ctatttcgcc 
aggttcaacg 
ctttatggtg 
cgccccagcg 



14520 
14580 
14640 
14700 
14760 
14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
15780 
15840 
15900 
15960 
16020 
16080 
16140 
16200 
16260 
16320 
16380 
16440 
16500 
16560 
16620 
16680 
16740 
16800 
16860 
16920 
16980 
17040 
17100 
17160 
17220 
17280 
17340 
17400 
17460 
17520 
17580 
17640 
17700 
17760 
17820 
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ctgtataagc atgaacgtcg cctggtggtg ccgctgctgg tttccagctc tctgctgttt 17880 
tatatcggca tggcattcgc ctactttgtg gtctttccgc tggcatttgg cttccttgcc 17940 
aataccgcgc cggaaggggt gcaggtatcc accgacatcg ccagctattt aagcttcgtt 18000 
atggcgctgt ttatggcgtt tggtgtctcc tttgaagtgc cggtagcaat tgtgctgctg 18060 
tgctggatgg ggattacctc gccagaagac ttacgcaaaa aacgcccgta tgtgctggtt 18120 
ggtgcattcg ttgtcgggat gttgctgacg ccgccggatg tcttctcgca aacgctgttg 18180 
gcgatcccga tgtactgtct gtttgaaatc ggtgtcttct tctcacgctt ttacgttggt 18240 
aaagggcgaa atcgggaaga ggaaaacgac gctgaagcag aaagcgaaaa aactgaagaa 18300 
taaattcaac cgcccgtcag ggcggttgtc atatggagta caggatgttt gatatcggcg 18360 
ttaatttgac cagttcgcaa tttgcgaaag accgtgatga tgttgtagcg tgcgcttttg 18420 
acgcgggagt taatgggcta ctcatcaccg gcactaacct gcgtgaaagc cagcaggcgc 18480 
aaaagctggc gcgtcagtat tcgtcctgtt ggtcaacggc gggcgtacat cctcacgaca 18540 
gcagccagtg gcaagctgcg actgaagaag cgattattga gctggccgcg cagccagaag 18600 
tggtggcgat tggtgaatgt ggtctcgact ttaaccgcaa cttttcgacg ccggaagagc 18660 
aggaacgcgc ttttgttgcc cagctacgca ttgccgcaga tttaaacatg ccggtattta 18720 
tgcactgtcg cgatgcccac gagcggttta tgacattgct ggagccgtgg ctggataaac 18780 
tgcctggtgc ggttcttcat tgctttaccg gcacacgcga agagatgcag gcgtgcgtgg 18840 
cgcatggaat ttatatcggc attaccggtt gggtttgcga tgaacgacgc ggactggagc 18900 
tgcgggaact tttgccgttg attccggcgg aaaaattact gatcgaaact gatgcgccgt 18 960 
atctgctccc tcgcgatctc acgccaaagc catcatcccg gcgcaacgag ccagcccatc 19020 
tgccccatat tttgcaacgt attgcgcact ggcgtggaga agatgccgca tggctggctg 19080 
ccaccacgga tgctaatgtc aaaacactgt ttgggattgc gttttagagt ttgcggaact 19140 
cggtattctt cacactgtgc ttaatctctt tattaataag attaagcaat agcatggagc 192 00 
gagcctcacc atcgggttcg gtgaaaatgg cctgaaagcc ttcgaacgcg ccttcggtaa 19260 
taatcacctt atcacccgga taaggggttg ccggatcgac aatgtctttc ggtttatata 19320 
ccgatagctg atgaataacc gccgatggga ctatcgctgg cgacgcgcca aagcgcacga 19380 
agtggctgac accgcgggtc gcgttgatag tcgtggtatg aatcacttct gggtcaaatt 19440 
ccacaaacag gtagttgggg aacaatggct cactgactgc agtacgtttt ccacgcacga 19500 
ttttttccag ggtgatcatc ggtgccaggc aattcacagc ctgtctttcg aggtgttcct 19560 
gggcacgttg aagttgcccg cgcttgcagt acagtaaata ccaggattgc ataatgactc 19620 
ttatccgttt aatcggggcg caaggatagc aaaagcttta cgctaagtta attatattcc 19680 
ccggtttgcg ttataccgtc agagttcacg ctaatttaac aaatttacag catcgcaaag 19740 
atgaacgccg tataatgggc gcagattaag aggctacaat ggacgccatg aaatataacg 19800 
atttacgcga cttcttgacg ctgcttgaac agcagggtga gctaaaacgt atcacgctcc 19860 
cggtggatcc gcatctggaa atcactgaaa ttgctgaccg cactttgcgt gccggtgggc 19920 
ctgcgctgtt gttcgaaaac cctaaaggct actcaatgcc ggtgctgtgc aacctgttcg 19 98 0 
gtacgccaaa gcgcgtggcg atgggcatgg ggcaggaaga tgtttcggcg ctgcgtgaag 20040 
ttggtaaatt attggcgttt ctgaaagagc cggagccgcc aaaaggtttc cgcgacctgt 2 010 0 
ttgataaact gccgcagttt aagcaagtat tgaacatgcc gacaaagcgg ctgcgtggtg 2 016 0 
cgccctgcca acaaaaaatc gtctctggcg atgacgtcga tctcaatcgc attcccatta 2022 0 
tgacctgctg gccggaagat gccgcgccgc tgattacctg ggggctgaca gtgacgcgcg 20280 
gcccacataa agagcggcag aatctgggca tttatcgcca gcagctgatt ggtaaaaaca 20340 
aactgattat gcgctggctg tcgcatcgcg gcggcgcgct ggattatcag gagtggtgtg 20400 
cggcgcatcc gggcgaacgt ttcccggttt ctgtggcgct gggtgccgat cccgccacga 20460 
ttctcggtgc agtcactccc gttccggata cgctttcaga gtatgcgttt gccggattgc 20520 
tacgtggcac caagaccgaa gtggtgaagt gtatctccaa tgatcttgaa gtgcccgcca 20580 
gtgcggagat tgtgctggaa gggtatatcg aacaaggcga aactgcgccg gaagggccgt 20640 
atggcgacca caccggttac tataatgaag tcgatagttt cccggtattt accgtgacgc 20700 
atattaccca gcgtgaagat gcgatttacc attccaccta taccgggcgt ccgccagatg 20760 
agcccgcggt gctgggtgtc gcactgaacg aagtgtttgt gccgattctg caaaaacagt 2 082 0 
tcccggaaat tgtcgatttt tacctgccgc cggaaggctg ctcttatcgc ctggcggtag 2 0880 
tgacaatcaa aaaacagtac gccggacacg cgaagcgcgt catgatgggc gtctggtcgt 20940 
tcttacgcca gtttatgtac actaaatttg tgatcgtttg cgatgatgac gttaacgcac 210 00 
gcgactggaa cgatgtgatt tgggcgatta ccacccgtat ggacccggcg cgggatactg 2106 0 
ttctggtaga aaatacgcct attgattatc tggattttgc ctcgcctgtc tccgggctgg 21120 
gttcaaaaat ggggctggat gccacgaata aatggccggg ggaaacccag cgtgaatggg 2118 0 
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gacgtcccat caaaaaagat ccagatgttg tcgcgcatat tgacgccatc tgggatgaac 21240 
tggctatttt taacaacggt aaaagcgcct gatgcgcgtt tgttttgccc tatttatcga 21300 
tccgacagag aaagcgcatg acaaccttaa gctgtaaagt gacctcggta gaagctatca 21360 
cggataccgt atatcgtgtc cgcatcgtgc cagacgcggc cttttctttt cgtgctggtc 21420 
agtatttgat ggtagtgatg gatgagcgcg acaaacgtcc gttctcaatg gcttcgacgc 21480 
cggatgaaaa agggtttatc gagctgcata ttggcgcttc tgaaatcaac ctttacgcga 21540 
aagcagtcat ggaccgcatc ctcaaagatc atcaaatcgt ggtcgacatt ccccacggag 21600 
aagcgtggct gcgcgatgat gaagagcgtc cgatgatttt gattgcgggc ggcaccgggt 21660 
tctcttatgc ccgctcgatt ttgctgacag cgttggcgcg taacccaaac cgtgatatca 21720 
ccatttactg gggcgggcgt gaagagcagc atctgtatga tctctgcgag cttgaggcgc 21780 
tttcgttgaa gcatcctggt ctgcaagtgg tgccggtggt tgaacaaccg gaagcgggct 21840 
ggcgtgggcg tactggcacc gtgttaacgg cggtattgca ggatcacggt acgctggcag 21900 
agcatgatat ctatattgcc ggacgttttg agatggcgaa aattgcccgc gatctgtttt 21960 
gcagtgagcg taatgcgcgg gaagatcgcc tgtttggcga tgcgtttgca tttatctgag 22020 
atataaaaaa acccgcccct gacaggcggg aagaacggca actaaactgt tattcagtgg 22080 
catttagatc tatgacgtat ctggcaaa 22108 



<210> 4 
<211> 831 
<212> DNA 

<213> Escherichia coli 



<400> 4 

atgcggcttt gtttaatcat catctaccac 
tggcagttat tgattattgc cgtcatcgtt 
tccatcggtt ccgatcttgg tgcgtcgatc 
gaaccaaagc aggataaaac cagtcaggat 
aagcaggcgg atacgaatca ggaacaggct 
gagcaggtga atccgtgttt gatatcggtt 
ggcctcgtcg ttctggggcc gcaacgactg 
attcgcgcgt tgcgttcact ggcgacaacg 
ctccaggagt ttcaggacag tctgaaaaag 
cccgaactga aagcgtcgat ggatgaacta 
tacgttgcaa acgatcctga aaaggcgagc 
gtgaaagata atgaagctgc gcatgagggc 
agttcgccgg aacagaagcc agaaaccacg 
gctgaaccga aaaccgctgc accttcccct 



agaggaacat gtatgggtgg tatcagtatt 6 0 
gtactgcttt ttggcaccaa aaagctcggc 12 0 
aaaggcttta aaaaagcaat gagcgatgat 18 0 
gctgatttta ctgcgaaaac tatcgccgat 24 0 
aaaacagaag acgcgaagcg ccacgataaa 3 00 
ttagcgaact tgctattggt gttcatcatc 360 
cctgtggcgg taaaaacggt agcgggctgg 42 0 
gtgcagaacg aactgaccca ggagttaaaa 48 0 
gttgaaaagg cgagcctcac taacctgacg 54 0 
cgccaggccg cggagtcgat gaagcgttcc 6 00 
gatgaagcgc acaccatcca taacccggtg 660 
gtaacgcctg ccgctgcaca aacgcaggcc 720 
ccagagccgg tggtaaaacc tgctgcggac 780 
tcgtcgagtg ataaaccgta a 831 



<210> 5 
<211> 778 
<212> DNA 

<213> Escherichia coli 



<400> 5 

atgtctgtag aagatactca accgcttatc 
ctgaactgca ttatcgcggt gatcgtgata 
atctatcacc tggtatccgc gccattgatc 
gccaccgacg tggcctcgcc gttctttacg 
attctgtcag cgccggtgat tctctatcag 
aagcatgaac gtcgcctggt ggtgccgctg 
ggcatggcat tcgcctactt tgtggtcttt 
gcgccggaag gggtgcaggt atccaccgac 
ctgtttatgg cgtttggtgt ctcctttgaa 
atggggatta cctcgccaga agacttacgc 
ttcgttgtcg ggatgttgct gacgccgccg 
ccgatgtact gtctgtttga aatcggtgtc 



acgcatctga ttgagctgcg taagcgtctg 6 0 
ttcctgtgtc tggtctattt cgccaatgac 120 
aagcagttgc cgcaaggttc aacgatgatc 180 
ccgatcaagc tgacctttat ggtgtcgctg 240 
gtgtgggcat. ttatcgcccc agcgctgtat 3 00 
ctggtttcca gctctctgct gttttatatc 360 
ccgctggcat ttggcttcct tgccaatacc 420 
atcgccagct atttaagctt cgttatggcg 4 80 
gtgccggtag caattgtgct gctgtgctgg 54 0 
aaaaaacgcc cgtatgtgct ggttggtgca 6 00 
gatgtcttct cgcaaacgct gttggcgatc 660 
ttcttctcac gcttttacgt tggtaaaggg 720 
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cgaaatcggg aagaggaaaa cgacgctgaa gcagaaagcg aaaaaactga agaataaa 778 



<210> 6 
<211> 795 
<212> DNA 

<213> Escherichia coli 



<400> 6 

atggagtaca ggatgtttga tatcggcgtt aatttgacca gttcgcaatt tgcgaaagac 60 
cgtgatgatg ttgtagcgtg cgcttttgac gcgggagtta atgggctact catcaccggc 120 
actaacctgc gtgaaagcca gcaggcgcaa aagctggcgc gtcagtattc gtcctgttgg 180 
tcaacggcgg gcgtacatcc tcacgacagc agccagtggc aagctgcgac tgaagaagcg 24 0 
attattgagc tggccgcgca gccagaagtg gtggcgattg gtgaatgtgg tctcgacttt 3 00 
aaccgcaact tttcgacgcc ggaagagcag gaacgcgctt ttgttgccca gctacgcatt 3 60 
gccgcagatt taaacatgcc ggtatttatg cactgtcgcg atgcccacga gcggtttatg 420 
acattgctgg agccgtggct ggataaactg cctggtgcgg ttcttcattg ctttaccggc 4 80 
acacgcgaag agatgcaggc gtgcgtggcg catggaattt atatcggcat taccggttgg 54 0 
gtttgcgatg aacgacgcgg actggagctg cgggaacttt tgccgttgat tccggcggaa 6 00 
aaattactga tcgaaactga tgcgccgtat ctgctccctc gcgatctcac gccaaagcca 660 
tcatcccggc gcaacgagcc agcccatctg ccccatattt tgcaacgtat tgcgcactgg 720 
cgtggagaag atgccgcatg gctggctgcc accacggatg ctaatgtcaa aacactgttt 780 
gggattgcgt tttag 795 



<210> 7 
<211> 258 
<212> PRT 

<213> Escherichia coli 



<400> 7 

Met Ser Val Glu 
1 

Arg Lys Arg Leu 
20 

Cys Leu Val Tyr 
35 

Leu He Lys Gin 
50 

Ala Ser Pro Phe 
65 

He Leu Ser Ala 



Pro Ala Leu Tyr 
100 

Ser Ser Ser Leu 
115 

Val Phe Pro Leu 
130 



Asp Thr Gin Pro 
5 

Leu Asn Cys He 



Phe Ala Asn Asp 
40 

Leu Pro Gin Gly 
55 

Phe Thr Pro He 
70 

Pro Val He Leu 
85 

Lys His Glu Arg 



Leu Phe Leu Tyr 
120 

Ala Phe Gly Phe 
135 



Leu He Thr His 
10 

He Ala Val He 
25 

He Tyr His Leu 



Ser Thr Met He 
60 

Lys Leu Thr Phe 
75 

Tyr Gin Val Trp 
90 

Arg Leu Val Val 
105 

Arg His Ala Phe 



Leu Ala Asn Thr 
140 



Leu He Glu Leu 
15 

Val He Phe Leu 
30 

Val Ser Ala Pro 
45 

Xaa Xaa Asp Val 



Met Val Ser Leu 
80 

Ala Phe He Ala 
95 

Pro Leu Leu Val 
110 

Ala Tyr Phe Val 
125 

Ala Pro Glu Gly 
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Val Gin Val Ser 
145 

Leu Phe Met Ala 



Leu Leu Cys Trp 
180 

Arg Pro Tyr Val 
195 

Pro Pro Asp Val 
210 

Leu Phe Glu lie 
225 

Arg Asn Arg Glu 



Glu Glu 



Thr Asp lie Ala 
150 

Phe Gly Val Ser 
165 

Met Gly lie Thr 



Leu Val Gly Ala 
200 

Phe Ser Gin Thr 
215 

Gly Val Phe Phe 
230 

Glu Glu Asn Asp 
245 



Ser Tyr Leu Ser 
155 

Phe Glu Val Pro 
170 

Ser Pro Glu Asp 
185 

Phe Val Val Gly 



Leu Leu Ala lie 
220 

Ser Arg Phe Tyr 
235 

Ala Glu Ala Glu 
250 



Phe Val Met Ala 
160 

Val Ala lie Val 
175 

Leu Arg Lys Lys 
190 

Met Leu Leu Thr 
205 

Pro Met Tyr Cys 



Val Gly Lys Gly 
240 

Ser Glu Lys Thr 
255 



<210> 8 
<211> 264 
<212> PRT 

<213> Escherichia coli 
<400> 8 

Met Glu Tyr Arg Met Phe Asp lie Gly Val Asn Leu Thr Ser Ser Gin 
15 10 15 

Phe Ala Lys Asp Arg Asp Asp Val Val Ala Cys Ala Phe Asp Ala Gly 
20 25 30 

Val Asn Gly Leu Leu lie Thr Gly Thr Asn Leu Arg Glu Ser Gin Gin 
35 40 45 

Ala Gin Lys Leu Ala Arg Gin Tyr Ser Ser Cys Trp Ser Thr Ala Gly 
50 55 60 

Val His Pro His Asp Ser Ser Gin Trp Gin Ala Ala Thr Glu Glu Ala 
65 70 75 80 

lie lie Glu Leu Ala Ala Gin Pro Glu Val Val Ala lie Gly Glu Cys 
85 90 95 

Gly Leu Asp Phe Asn Arg Asn Phe Ser Thr Pro Glu Glu Gin Glu Arg 
100 105 110 

Ala Phe Val Ala Gin Leu Arg lie Ala Ala Asp Leu Asn Met Pro Val 
115 120 125 

Phe Met His Cys Arg Asp Ala His Glu Arg Phe Met Thr Leu Leu Glu 
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130 

Pro Trp Leu Asp Lys 
145 

Thr Arg Glu Glu Met 
165 

He Thr Gly Trp Val 
180 

Leu Leu Pro Leu He 
195 

Pro Tyr Leu Leu Pro 
210 

Asn Glu Pro Ala His 
225 

Arg Gly Glu Asp Ala 
245 

Lys Thr Leu Phe Gly 
260 



135 

Leu Pro Gly Ala Val Leu 
150 155 

Gin Ala Cys Val Ala His 
170 

Cys Asp Glu Arg Arg Gly 
185 

Pro Ala Glu Lys Leu Leu 
200 

Arg Asp Leu Thr Pro Lys 
215 

Leu Pro His He Leu Gin 
230 235 

Ala Trp Leu Ala Ala Thr 
250 

He Ala Phe 



140 

His Cys Phe Thr Gly 
160 

Gly He Tyr He Gly 
175 

Leu Glu Leu Arg Glu 
190 

He Glu Thr Asp Ala 
205 

Pro Ser Ser Arg Arg 
220 

Arg He Ala His Trp 
240 

Thr Asp Ala Asn Val 
255 



<210> 9 
<211> 243 
<212> PRT 
<213> Zea mays 



<400> 9 

Met Thr Pro Thr 
1 

He Ser Asp Val 
20 

Pro Arg Pro Cys 
35 

Met Val Ser Ser 
50 

Val He Cys Ala 
65 

He Gly Val Val 



Val Ala Arg Asn 
100 

Arg Glu Leu Gin 
115 



Ala Asn Leu Leu 
5 

Arg Arg Leu Gin 



Trp Lys Gly Val 
40 

Phe Val Ala Val 
55 

Ser Leu Phe Gly 
70 

Ala Leu Leu Val 
85 

Leu Gly Lys Thr 



Asp Val Ser Arg 
120 



Leu Pro Ala Pro 
10 

Leu Pro Pro Arg 
25 

Glu Trp Gly Ser 



Gly Ser Arg Thr 
60 

Val Gly Ala Pro 
75 

Phe Gly Pro Lys 
90 

Leu Arg Ala Phe 
105 

Glu Phe Arg Ser 



Pro Phe Val Pro 
15 

Val Arg His Gin 
30 

He Gin Thr Arg 
45 

Arg Arg Arg Asn 



Glu Ala Leu Val 
80 

Gly Leu Ala Glu 
95 

Gin Pro Thr He 
110 

Thr Leu Glu Arg 
125 
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Glu He Gly He Asp 
130 

Thr Met Asn Asn Asn 
145 

Glu Pro Ala Pro Tyr 
165 

Gin He Ala Ala Ser 
180 

Thr Ser Gin Gin Gin 
195 

Ala Pro Thr Ser Gly 
210 

Val Ser Asp Ser Asp 
225 

Gly Glu Arg 



Glu Val Ser Gin Ser Thr 
135 

Gin Gin Pro Ala Ala Asp 
150 155 

Thr Ser Glu Glu Leu Met 
170 

Ala Ala Ala Ala Trp Asn 
185 

Glu Glu Ala Pro Thr Thr 
200 

Gly Ser Asp Gly Pro Ala 
215 

Pro Asn Gin Val Asn Lys 
230 235 



Asn Tyr Arg Pro Thr 
140 

Pro Asn Val Lys Pro 
160 

Lys Val Thr Glu Glu 
175 

Pro Gin Gin Pro Ala 
190 

Pro Arg Ser Glu Asp 
205 

Ala Pro Ala Arg Ala 
220 

Ser Gin Lys Ala Glu 
240 



<210> 10 
<211> 67 
<212> PRT 

<213> Escherichia coli 
<400> 10 

Met Gly Glu He Ser He Thr Lys Leu Leu Val Val Ala Ala Leu Val 
15 10 15 

Val Leu Leu Phe Gly Thr Lys Lys Leu Arg Thr Leu Gly Gly Asp Leu 
20 25 30 

Gly Ala Ala He Lys Gly Phe Lys Lys Ala Met Asn Asp Asp Asp Ala 
35 40 45 

Ala Ala Lys Lys Gly Ala Asp Val Asp Leu Gin Ala Glu Lys Leu Ser 
50 55 60 

His Lys Glu 
65 



<210> 11 
<211> 126 
<212> PRT 

<213> Mycobacterium tuberculosis 
<400> 11 

Met Ala Leu Thr Leu Val Met Gly Ala He Ala Ser Pro Trp Val Ser 
15 10 15 
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Val Gly Thr Lys Leu Cys Tyr Ser Arg Leu Asn Glu Ser Phe Tyr Pro 
20 25 30 

Ser Asn Pro Leu Thr Ala Pro Asn Pro Met Asn lie Phe Gly lie Gly 
35 40 45 

Leu Pro Glu Leu Gly Leu lie Phe Val lie Ala Leu Leu Val Phe Gly 
50 55 60 

Pro Lys Lys Leu Pro Glu Val Gly Arg Ser Leu Gly Lys Ala Leu Arg 
65 70 75 80 

Gly Phe Gin Glu Ala Ser Lys Glu Phe Glu Thr Glu Leu Lys Arg Glu 
85 90 95 

Ala Gin Asn Leu Glu Lys Ser Val Gin lie Lys Ala Glu Leu Glu Glu 
100 105 110 

Ser Lys Thr Pro Glu Ser Ser Ser Ser Ser Glu Lys Ala Ser 
115 120 125 



<210> 12 
<211> 98 
<212> PRT 

<213> Rhodococcus erythropolis 
<400> 12 

Met Gly Ala Met Ser Pro Trp His Trp Ala lie Val Ala Leu Val Val 
15 10 15 

Val lie Leu Phe Gly Ser Lys Lys Leu Pro Asp Ala Ala Arg Gly Leu 
20 25 30 

Gly Arg Ser Leu Arg lie Phe Lys Ser Glu Val Lys Glu Met Gin Asn 
35 40 45 

Asp Asn Ser Thr Pro Ala Pro Thr Ala Gin Ser Ala Pro Pro Pro Gin 
50 55 60 

Ser Ala Pro Ala Glu Leu Pro Val Ala Asp Thr Thr Thr Ala Pro Val 
65 70 75 80 

Thr Pro Pro Ala Pro Val Gin Pro Gin Ser Gin His Thr Glu Pro Lys 
85 90 95 

Ser Ala 



<210> 13 
<211> 58 
<212> PRT 

<213> Pseudomonas stutzeri 
<400> 13 
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1 5 

Val Met Leu Phe Gly Thr Lys Arg Leu 
20 25 

Gly Ser Ala He Asn Gly Phe Arg Lys 
35 40 

Thr Thr Gin Ala Glu Ala Ser Ser Arg 
50 55 



PCT/CA99/00272 

16 

Leu He He Leu Leu He Val 
10 15 

Arg Gly Leu Gly Ser Asp Leu 
3 0 

Ser Val Ser Asp Gly Glu Thr 
45 

Ser 



<210> 14 
<211> 88 
<212> PRT 

<213> Mycobacterium leprae 
<400> 14 

Met Gly Ser Leu Ser Pro Trp His Trp Val Val Leu Val Val Val Val 
1 * 5 10 15 

Val Leu Leu Phe Gly Ala Lys Lys Leu Pro Asp Ala Ala Arg Ser Leu 
20 25 30 

Gly Lys Ser Met Arg He Phe Lys Ser Glu Leu Arg Glu Met Gin Thr 
35 40 45 

Glu Asn Gin Ala Gin Ala Ser Ala Leu Glu Thr Pro Met Gin Asn Pro 
50 55 60 

Thr Val Val Gin Ser Gin Arg Val Val Pro Pro Trp Ser Thr Glu Gin 
65 70 75 80 

Asp His Thr Glu Ala Arg Pro Ala 
85 



<210> 15 
<211> 79 
<212> PRT 

<213> Helicobacter pylori 
<400> 15 

Met Gly Gly Phe Thr Ser He Trp His Trp Val He Val Leu Leu Val 
! * ~ 5 10 15 

He Val Leu Leu Phe Gly Ala Lys Lys He Pro Glu Leu Ala Lys Gly 
20 25 30 

Leu Gly Ser Gly He Lys Asn Phe Lys Lys Ala Val Lys Asp Asp Glu 
35 * 40 45 

Glu Glu Ala Lys Asn Glu Pro Lys Thr Leu Asp Ala Gin Ala Thr Gin 
50 55 60 
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Thr Lys Val His Glu Ser Ser Glu lie Lys Ser Lys Gin Glu Ser 
65 70 75 



<210> 16 
<211> 109 
<212> PRT 

<213> Haemophilus influenzae 
<400> 16 

Met Ala Lys Lys Ser lie Phe Arg Ala Lys Phe Phe Leu Phe Tyr Arg 
15 10 15 

Thr Glu Phe lie Met Phe Gly Leu Ser Pro Ala Gin Leu lie lie Leu 
20 25 30 

Leu Val Val lie Leu Leu lie Phe Gly Thr Lys Lys Leu Arg Asn Ala 
35 40 45 

Gly Ser Asp Leu Gly Ala Ala Val Lys Gly Phe Lys Lys Ala Met Lys 
50 55 60 

Glu Asp Glu Lys Val Lys Asp Ala Glu Phe Lys Ser lie Asp Asn Glu 
65 70 75 80 

Thr Ala Ser Ala Lys Lys Gly Lys Tyr Lys Arg Glu Arg Asn Arg Leu 
85 90 95 

Asn Pro Cys Leu lie Leu Val Phe Gin Asn Leu Phe Tyr 
100 105 



<210> 17 
<211> 57 
<212> PRT 

<213> Bacillus subtilis 
<400> 17 

Met Pro He Gly Pro Gly Ser Leu Ala Val He Ala He Val Ala Leu 
15 10 15 

He He Phe Gly Pro Lys Lys Leu Pro Glu Leu Gly Lys Ala Ala Gly 
20 25 30 

Asp Thr Leu Arg Glu Phe Lys Asn Ala Thr Lys Gly Leu Thr Ser Asp 
35 40 45 

Glu Glu Glu Lys Lys Lys Glu Asp Gin 
50 55 



<210> 18 

<211> 192 

<212> PRT 

<213> Azotobacter chroococcum 
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<400> 18 

Met Gly Phe Gly Gly lie Ser lie Trp Gin Leu Leu lie lie Leu Leu 
15 10 15 

He Val Val Met Leu Phe Gly Thr Lys Arg Leu Lys Ser Leu Gly Ser 
20 25 30 

Asp Leu Gly Asp Ala He Lys Gly Phe Arg Lys Ser Met Asp Asn Glu 
35 40 45 

Glu Asn Lys Ala Pro Pro Val Glu Glu Gin Lys Gly Gin Asp His Arg 
50 55 60 

Gly Pro Gly Pro Gin Gly Arg Gly Thr Gly Gin Glu Arg Leu Ser Met 
65 70 75 80 

Phe Asp He Gly Phe Ser Glu Leu Leu Leu Val Gly Leu Val Ala Leu 
85 90 95 

Leu Val Leu Gly Pro Glu Arg Leu Pro Val Ala Ala Arg Met Ala Gly 
100 105 110 

Leu Trp He Gly Arg Leu Lys Arg Ser Phe Asn Thr Leu Lys Thr Glu 
115 120 125 

Val Glu Arg Glu He Gly Ala Asp Glu He Arg Arg Gin Leu His Asn 
130 135 140 

Glu Arg He Leu Glu Leu Glu Arg Glu Met Lys Gin Ser Leu Gin Pro 
145 150 155 160 

Pro Ala Pro Ser Ala Pro Asp Glu Thr Ala Ala Ser Pro Ala Thr Pro 
165 170 175 

Pro Gin Pro Ala Ser Pro Ala Ala His Ser Asp Lys Thr Pro Ser Pro 
180 185 190 



<210> 19 
<211> 158 
<212> PRT 

<213> Proteus vulgaris 
<400> 19 

Thr Glu His Leu Glu Glu Leu Arg Gin Arg Thr Val Phe Val Phe He 
15 10 15 

Phe Phe Leu Leu Ala Ala Thr He Ser Phe Thr Gin He Lys He He 
20 25 30 

Val Glu He Phe Gin Ala Pro Ala He Gly He Lys Phe Leu Gin Leu 
35 40 45 
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Ala Pro Gly Glu Tyr Phe Phe Ser Ser lie Lys lie Ala lie Tyr Cys 
50 55 60 

Gly lie Val Ala Thr Thr Pro Phe Gly Val Tyr Gin Val lie Leu Tyr 
65 70 75 80 

lie Leu Pro Gly Leu Thr Asn Lys Glu Arg Lys Val lie Leu Pro lie 
85 90 95 

Leu lie Gly Ser lie Val Leu Phe lie Val Gly Gly lie Phe Ala Tyr 
100 105 110 

Phe Val Leu Ala Pro Ala Ala Leu Asn Phe Leu lie Ser Tyr Gly Ala 
115 120 125 

Asp lie Val Glu Pro Leu Trp Ser Phe Glu Gin Tyr Phe Asp Phe lie 
130 135 140 

Leu Leu Leu Leu Phe Ser Thr Gly Leu Ala Phe Glu lie Pro 
145 150 155 



<210> 20 
<211> 168 
<212> PRT 

<213> Marchantia polymorpha 
<400> 20 

Lys Thr lie Leu Glu Glu Val Arg lie Arg Val Phe Trp He Leu He 
15 10 15 

Cys Phe Ser Phe Thr Trp Phe Thr Cys Tyr Trp Phe Ser Glu Glu Phe 
20 25 30 

He Phe Leu Leu Ala Lys Pro Phe Leu Thr Leu Pro Tyr Leu Asp Ser 
35 40 45 

Ser Phe He Cys Thr Gin Leu Thr Glu Ala Leu Ser Thr Tyr Val Thr 
50 55 60 

Thr Ser Leu He Ser Cys Phe Tyr Phe Leu Phe Pro Phe Leu Ser Tyr 
65 70 75 80 

Gin He Trp Cys Phe Leu Met Pro Ser Cys Tyr Glu Glu Gin Arg Lys 
85 90 95 

Lys Tyr Asn Lys Leu Phe Tyr Leu Ser Gly Phe Cys Phe Phe Leu Phe 
100 105 110 

Phe Phe Val Thr Phe Val Trp He Val Pro Asn Val Trp His Phe Leu 
115 120 125 

Tyr Lys Leu Ser Thr Thr Ser Thr Asn Leu Leu He He Lys Leu Gin 
130 135 140 

Pro Lys He Phe Asp Tyr He Met Leu Thr Val Arg He Leu Phe He 
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20 

145 150 155 160 

Ser Ser lie Cys Ser Gin Val Pro 
165 



<210> 21 
<211> 167 
<212> PRT 

<213> Arabidopsis thaliana 
<400> 21 

Glu Thr lie Leu Gly Glu Val Arg lie Arg Ser Val Arg lie Leu lie 
15 10 15 

Gly Leu Gly Leu Thr Trp Phe Thr Cys Tyr Trp Phe Pro Glu Glu Leu 
20 25 30 

lie Ser Pro Leu Ala Ser Pro Phe Leu Thr Leu Pro Phe Asp Ser Tyr 
35 40 45 

Phe Val Cys Thr Gin Leu Thr Glu Ala Phe Ser Thr Phe Val Ala Thr 
50 55 60 

Ser Ser lie Ala Cys Ser Tyr Phe Val Phe Pro Leu lie Ser Tyr Gin 
65 70 75 80 

lie Trp Cys Phe Leu lie Pro Ser Cys Tyr Gly Glu Gin Arg Thr Lys 
85 90 95 

Tyr Asn Arg Phe Leu His Leu Ser Gly Ser Arg Phe Phe Leu Phe Leu 
100 105 110 

Phe Leu Thr Pro Pro Arg Val Val Pro Asn Val Trp His Phe Pro Tyr 
115 120 125 

Phe Val Gly Ala Thr Ser Thr Asn Ser Leu Met lie Lys Leu Gin Pro 
130 135 140 

Lys lie Tyr Asp His lie Met Leu Thr Val Arg lie Ser Phe lie Pro 
145 150 155 160 

Ser Val Cys Ser Gin Val Pro 
165 



<210> 22 
<211> 163 
<212> PRT 

<213> Reclinomonas americana 
<400> 22 

Leu Thr His Leu Tyr Glu lie Arg Leu Arg lie lie Tyr Leu Leu Tyr 
15 10 15 

Ser lie Phe Leu Thr Cys Phe Cys Ser Tyr Gin Tyr Lys Glu Glu He 
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Phe Tyr Leu Leu Phe lie Pro Leu Ser Lys Asn Phe lie Tyr Thr Asp 
35 40 45 

Leu lie Glu Ala Phe lie Thr Tyr lie Lys Leu Ser lie lie Val Gly 
50 55 60 

lie Tyr Leu Ser Tyr Pro lie Phe Leu Tyr Gin lie Trp Ser Phe Leu 
65 70 75 80 

lie Pro Gly Phe Phe Leu Tyr Glu Lys Lys Leu Phe Arg Leu Leu Cys 
85 90 95 

Leu Thr Ser lie Phe Leu Tyr Phe Leu Gly Ser Cys He Gly Tyr Tyr 
100 105 110 

Leu Leu Phe Pro He Ala Phe Thr Phe Phe Leu Gly Phe Gin Lys Leu 
115 120 125 

Gly Lys Asp Gin Leu Phe Thr He Glu Leu Gin Ala Lys He His Glu 
130 135 140 

Tyr Leu lie Leu Asn Thr Lys Leu He Phe Ser Leu Ser He Cys Phe 
145 150 155 160 

Gin Leu Pro 



<210> 23 
<211> 158 
<212> PRT 

<213> Synechocystis sp . 
<400> 23 

Phe Asp His Leu Asp Glu Leu Arg Thr Arg He Phe Leu Ser Leu Gly 
15 10 15 

Ala Val Leu Val Gly Val Val Ala Cys Phe He Phe Val Lys Pro Leu 
20 25 30 

Val Gin Trp Leu Gin Val Pro Ala Gly Thr Val Lys Phe Leu Gin Leu 
35 40 45 

Ser Pro Gly Glu Phe Phe Phe Val Ser Val Lys Val Ala Gly Tyr Ser 
50 55 60 

Gly He Leu Val Met Ser Pro Phe He Leu Tyr Gin He He Gin Phe 
65 70 75 80 

Val Leu Pro Gly Leu Thr Arg Arg Glu Arg Arg Leu Leu Gly Pro Val 
85 90 95 

Val Leu Gly Ser Ser Val Leu Phe Phe Ala Gly Leu Gly Phe Ala Tyr 
100 105 HO 
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Tyr Ala Leu lie Pro Ala Ala Leu Lys Phe Phe Val Ser Tyr Gly Ala 

115 120 125 

Asp Val Val Glu Gin Leu Trp Ser He Asp Lys Tyr Phe Glu Phe Val 

130 135 140 

Leu Leu Leu Met Phe Ser Thr Gly Leu Ala Phe Gin He Pro 

145 150 155 



<210> 24 
<211> 178 
<212> PRT 

<213> Mycobacterium tuberculosis 
<400> 24 

Val Asp His Leu Thr Glu Leu Arg Thr Arg Leu Leu He Ser Leu Ala 
15 10 15 

Ala He Leu Val Thr Thr He Phe Gly Phe Val Trp Tyr Ser His Ser 
20 25 30 

He Phe Gly Leu Asp Ser Leu Gly Glu Trp Leu Arg His Pro Tyr Cys 
35 40 45 

Ala Leu Pro Gin Ser Ala Arg Ala Asp He Ser Ala Asp Gly Glu Cys 
50 55 60 

Arg Leu Leu Ala Thr Ala Pro Phe Asp Gin Phe Met Leu Arg Leu Lys 
65 70 75 80 

Val Gly Met Ala Ala Gly He Val Leu Ala Cys Pro Val Trp Phe Tyr 
85 90 95 

Gin Leu Trp Ala Phe He Thr Pro Gly Leu Tyr Gin Arg Glu Arg Arg 
100 105 110 

Phe Ala Val Ala Phe Val He Pro Ala Ala Val Leu Phe Val Ala Gly 
115 120 125 

Ala Val Leu Ala Tyr Leu Val Leu Ser Lys Ala Leu Gly Phe Leu Leu 
130 135 140 

Thr Val Gly Ser Asp Val Gin Val Thr Ala Leu Ser Gly Asp Arg Tyr 
145 150 155 160 

Phe Gly Phe Leu Leu Asn Leu Leu Val Val Phe Gly Val Ser Phe Glu 
165 170 175 

Phe Pro 



<210> 25 
<211> 155 
<212> PRT 
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<213> Helicobacter pylori 
<400> 25 

His Leu Gin Glu Leu Arg Lys Arg Leu Met Val Ser Val Gly Thr lie 
15 10 15 

Leu Val Ala Phe Leu Gly Cys Phe His Phe Trp Lys Ser lie Phe Glu 
20 25 30 

Phe Val Lys Asn Ser Tyr Lys Gly Thr Leu lie Gin Leu Ser Pro lie 
35 40 45 

Glu Gly Val Met Val Ala Val Lys lie Ser Phe Ser Ala Ala lie Val 
50 55 60 

lie Ser Met Pro lie lie Phe Trp Gin Leu Trp Leu Phe lie Ala Pro 
65 70 75 80 

Gly Leu Tyr Lys Asn Glu Lys Lys Val lie Leu Pro Phe Val Phe Phe 
85 90 95 

Gly Ser Gly Met Phe Leu He Gly Ala Ala Phe Ser Tyr Tyr Val Val 
100 105 110 

Phe Pro Phe He He Glu Tyr Leu Ala Thr Phe Gly Ser Asp Val Phe 
115 120 125 

Ala Ala Asn He Ser Ala Ser Ser Tyr Val Ser Phe Phe Thr Arg Leu 
130 135 140 

He Leu Gly Phe Gly Val Ala Phe Glu Leu Pro 
145 150 155 



<210> 26 
<211> 163 
<212> PRT 

<213> Haemophilus influenzae 
<400> 26 

He Thr His Leu Val Glu Leu Arg Asn Arg Leu Leu Arg Cys Val He 
15 10 15 

Cys Val Val Leu Val Phe Val Ala Leu Val Tyr Phe Ser Asn Asp He 
20 25 30 

Tyr His Phe Val Ala Ala Pro Leu Thr Ala Val Met Pro Lys Gly Ala 
35 40 45 

Thr Met He Ala Thr Asn He Gin Thr Pro Phe Phe Thr Pro He Lys 
50 55 60 

Leu Thr Ala He Val Ala He Phe He Ser Val Pro Tyr Leu Leu Tyr 
65 70 75 80 

Gin He Trp Ala Phe He Ala Pro Ala Leu Tyr Gin His Glu Lys Arg 
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85 90 95 

Met lie Tyr Pro Leu Leu Phe Ser Ser Thr lie Leu Phe Tyr Cys Gly 
100 105 110 

Val Ala Phe Ala Tyr Tyr lie Val Phe Pro Leu Val Phe Ser Phe Phe 
115 120 125 

Thr Gin Thr Ala Pro Glu Gly Val Thr He Ala Thr Asp He Ser Ser 
130 135 140 

Tyr Leu Asp Phe Ala Leu Ala Leu Phe Leu Ala Phe Gly Val Cys Phe 
145 150 155 160 

Glu Val Pro 



<210> 27 
<211> 161 
<212> PRT 
<213> Bacillus 



subtilis 



<400> 27 
Leu Glu His He 
1 

Ala Phe Val Val 
20 

He Val Tyr Leu 
35 

Ala Phe Asn Leu 
50 

He He Gly He 
65 

Ala Phe Val Ser 



Ser Tyr He Pro 
100 

Ser Tyr Tyr He 
115 

Ser Gin Asp Leu 
130 

His Phe Leu Leu 
145 



Ala Glu Leu Arg 
5 

Phe Phe He Ala 



Gin Glu Thr Asp 
40 

Thr Asp Pro Leu 
55 

Val Leu Thr Ser 
70 

Pro Gly Leu Tyr 
85 

Val Ser He Leu 



Leu Phe Pro Phe 
120 

Asn Val Asn Gin 
135 

Gin Leu Thr He 
150 



Lys Arg Leu Leu 
10 

Gly Phe Phe Leu 
25 

Glu Ala Lys Gin 



Tyr Val Phe Met 
60 

Pro Val He Leu 
75 

Glu Lys Glu Arg 
90 

Leu Phe Leu Ala 
105 

Val Val Asp Phe 



Val He Gly He 
140 

Pro Phe Gly Leu 
155 



He Val Ala Leu 
15 

Ala Lys Pro He 
30 

Leu Thr Leu Asn 
45 

Gin Phe Ala Phe 



Tyr Gin Leu Trp 
80 

Lys Val Thr Leu 
95 

Gly Leu Ser Phe 
110 

Met Lys Arg He 
125 

Asn Glu Tyr Phe 



Leu Phe Gin Met 
160 



Pro 
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<210> 28 

<211> 163 

<212> PRT 

<213> Azotobacter chroococcum 



<400> 28 
Val Ala His Leu 
1 

Ala Val Leu Leu 
20 

Tyr Ala Leu Val 
35 

Thr Met lie Ala 
50 

Leu Thr Leu Met 
65 

Gin Val Trp Gly 



lie Ala Met Pro 
100 

Met Ala Phe Ala 
115 

Ala Ser Val Thr 
130 

Tyr Leu Asp Phe 
145 

Glu Val Pro 



Thr Glu Leu Arg 
5 

lie Phe Ala Ala 



Ser Ala Pro Leu 
40 

Thr Gly Val Ala 
55 

lie Ser Leu Phe 
70 

Phe lie Ala Pro 
85 

Leu Met Ala Ser 



Tyr Phe Val Val 
120 

Pro Glu Gly Val 
135 

Val Leu Thr Leu 
150 



Ser Arg Leu Leu 
10 

Leu Phe Tyr Phe 
25 

Arg Ala Tyr Leu 



Ser Pro Phe Leu 
60 

Leu Ala Met Pro 
75 

Gly Leu Tyr Gin 
90 

Ser Val Leu Leu 
105 

Phe Pro lie Met 



Ala Met Met Thr 
140 

Phe Phe Ala Phe 
155 



Arg Ser Val Ala 
15 

Ala Gin Asp lie 
30 

Pro Glu Gly Ala 
45 

Ala Pro Phe Lys 



Val Val Leu His 
80 

His Glu Lys Arg 
95 

Phe Tyr Ala Gly 
110 

Phe Gly Phe Phe 
125 

Asp lie Gly Gin 



Gly Val Ala Phe 
160 



<210> 29 
<211> 204 
<212> PRT 

<213> Archaeoglobus fulgidus 
<400> 29 

lie Ala Leu lie Val lie Val Val Ser Ser Leu Phe Phe Thr Phe Gly 
15 10 15 

Ala Asn lie Val Val Gly Lys lie lie Gly Asp Leu Phe Pro Gly Glu 
20 25 30 

Ala Val lie Glu Asn Arg Asp Lys lie Leu Ala lie Ala Glu Glu Leu 
35 40 45 
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Lys Lys lie Ala Ser 
50 

Ala Asn Arg Ser lie 
65 

Ala Met Gin Leu Ser 
85 

Leu Leu Leu Tyr Leu 
100 

Leu Pro Tyr lie Phe 
115 

Val lie Thr Phe Ser 
130 

Ala Ala lie Phe Leu 
14 5 

Met Lys Phe Phe lie 
165 

Ala lie Pro Leu Tyr 
180 

Met Leu Val Leu Phe 
195 



Asp Leu Glu Asn Tyr Ala 
55 

Ala Phe Ala Ala Ser Lys 
70 75 

Thr Ser Pro Val Leu Leu 
90 

Lys lie Ser Leu Ala Val 
105 

His Leu Val Leu Thr Ala 
120 

Phe Arg Lys Thr Ser Ala 
135 

Phe Ala Leu Gly lie Phe 
150 155 

Lys Phe Leu Tyr Leu Met 
170 

Ser Leu Ser Glu Phe Val 
185 

Gly He Val Phe Glu Leu 
200 



Tyr His Pro Ser Glu 
60 

Ser Leu Val Arg He 
80 

Thr Pro Leu Glu Gly 
95 

Gly He Ala Ala Ala 
110 

Leu Arg Glu Arg Gly 
125 

Phe Lys Tyr Gly Met 
140 

Tyr Gly Tyr Asn Met 
160 

Ala Val Ser Gin Gly 
175 

Asn Phe Val Ala Leu 
190 

Pro 



<210> 30 
<211> 136 
<212> PRT 

<213> Escherichia coli 
<400> 30 

Asp Val Glu Asp Leu Arg Arg Leu Ala Ala Glu Glu Gly Val Val Ala 
15 10 15 

Leu Gly Glu Thr Gly Leu Asp Tyr Tyr Tyr Thr Pro Glu Thr Lys Val 
20 25 30 

Arg Gin Gin Glu Ser Phe He His His He Gin He Gly Arg Glu Leu 
35 40 45 

Asn Lys Pro Val He Val His Thr Arg Asp Ala Arg Ala Asp Thr Leu 
50 55 60 

Ala He Leu Arg Glu Glu Lys Val Thr Asp Cys Gly Gly Val Leu His 
65 70 75 80 

Cys Phe Thr Glu Asp Arg Glu Thr Ala Gly Lys Leu Leu Asp Leu Gly 
85 90 95 

Phe Tyr He Ser Phe Ser Gly He Val Thr Phe Arg Asn Ala Glu Gin 
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100 105 110 

Leu Arg Asp Ala Ala Arg Tyr Val Pro Leu Asp Arg Leu Leu Val Glu 
115 120 125 

Thr Asp Ser Pro Tyr Leu Ala Pro 
130 135 



<210> 31 
<211> 137 
<212> PRT 

<213> Escherichia coli 
<400> 31 

Ser Leu Glu Gin Leu Gin Gin Ala Leu Glu Arg Arg Pro Ala Lys Val 
15 10 15 

Val Ala Val Gly Glu lie Gly Leu Asp Leu Phe Gly Asp Asp Pro Gin 
20 25 30 

Phe Glu Arg Gin Gin Trp Leu Leu Asp Glu Gin Leu Lys Leu Ala Lys 
35 40 45 

Arg Tyr Asp Leu Pro Val lie Leu His Ser Arg Arg Thr His Asp Lys 
50 55 60 

Leu Ala Met His Leu Lys Arg His Asp Leu Pro Arg Thr Gly Val Val 
65 70 75 80 

His Gly Phe Ser Gly Ser Leu Gin Gin .Ma Glu Arg Phe Val Gin Leu 
85 90 95 

Gly Tyr Lys lie Gly Val Gly Gly Thr lie Thr Tyr Pro Arg Ala Ser 
100 105 110 

Lys Thr Arg Asp Val lie Ala Lys Leu Pro Leu Ala Ser Leu Leu Leu 
115 120 125 

Glu Thr Asp Ala Pro Asp Met Pro Leu 
130 " 135 



<210> 32 
<211> 135 
<212> PRT 

<213> Methanobacterium thermoautotrophicum 
<400> 32 

Leu lie Gly Glu Val Val Ser Gin lie Glu Ser Asn lie Asp Leu lie 
15 10 15 

Val Ala Val Gly Glu Thr Gly Met Asp Phe His His Thr Arg Asp Glu 
20 25 30 

Glu Gly Arg Arg Arg Gin Glu Glu Thr Phe Arg Val Phe Val Glu Leu 
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35 40 45 

Ala Ala Glu His Glu Met Pro Leu Val Val His Ala Arg Asp Ala Glu 
50 55 60 

Glu Arg Ala Leu Glu Thr Val Leu Glu Tyr Arg Val Pro Glu Val lie 
65 70 75 80 

Phe His Cys Tyr Gly Gly Ser lie Glu Thr Ala Arg Arg lie Leu Asp 
85 90 95 

Glu Gly Tyr Tyr lie Ser lie Ser Thr Leu Val Ala Phe Ser Glu His 
100 105 110 

His Met Glu Leu Val Arg Ala lie Pro Leu Glu Gly Met Leu Thr Glu 
115 120 125 

Thr Asp Ser Pro Tyr Leu Ser 
130 135 



<210> 33 
<211> 142 
<212> PRT 

<213> Mycoplasma pneumoniae 
<400> 33 

Ala Gin Ala Thr Leu Lys Lys Leu Val Ser Thr His Arg Ser Phe lie 
15 10 15 

Ser Cys lie Gly Glu Tyr Gly Phe Asp Tyr His Tyr Thr Lys Asp Tyr 
20 25 30 

lie Thr Gin Gin Glu Gin Phe Phe Leu Met Gin Phe Gin Leu Ala Glu 
35 40 45 

Gin Tyr Gin Leu Val His Met Leu His Val Arg Asp Val His Glu Arg 
50 55 60 

lie Tyr Glu Val Leu Lys Arg Leu Lys Pro Lys Gin Pro Val Val Phe 
65 70 75 80 

His Cys Phe Ser Glu Asp Thr Asn Thr Ala Leu Lys Leu Leu Thr Leu 
85 90 95 

Arg Glu Val Gly Leu Lys Val Tyr Phe Ser lie Pro Gly lie Val Thr 
100 105 110 

Phe Lys Asn Ala Lys Asn Leu Gin Ala Ala Leu Ser Val lie Pro Thr 
115 120 125 

Glu Leu Leu Leu Ser Glu Thr Asp Ser Pro Tyr Leu Ala Pro 
130 135 140 



<210> 34 
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<211> 140 
<212> PRT 

<213> Mycobacterium tuberculosis 
<400> 34 

Ala Arg Ala Glu Leu Glu Arg Leu Val Ala His Pro Arg Val Val Ala 
1 5 10 15 

Val Gly Glu Thr Gly He Asp Met Tyr Trp Pro Gly Arg Leu Asp Gly 
20 25 3 0 

Cys Ala Glu Pro His Val Gin Arg Glu Ala Phe Ala Trp His He Asp 
35 40 45 

Leu Ala Lys Arg Thr Gly Lys Pro Leu Met He His Asn Arg Gin Ala 
50 55 60 

Asp Arg Asp Val Leu Asp Val Leu Arg Ala Glu Gly Ala Pro Asp Thr 
65 ~* 70 75 80 

Val He Leu His Cys Phe Ser Ser Asp Ala Ala Met Ala Arg Thr Cys 
85 90 95 

Val Asp Ala Gly Trp Leu Leu Ser Leu Ser Gly Thr Val Ser Phe Arg 
100 105 110 

Thr Ala Arg Glu Leu Arg Glu Ala Val Pro Leu Met Pro Val Glu Gin 
115 120 125 

Leu Leu Val Glu Thr Asp Ala Pro Tyr Leu Thr Pro 
130 135 140 



<210> 35 
<211> 138 
<212> PRT 

<213> Helicobacter pylori 
<400> 35 

Asp Glu Ser Leu Phe Glu Lys Phe Val Gly His Gin Lys Cys Val Ala 
15 10 15 

He Gly Glu Cys Gly Leu Asp Tyr Tyr Arg Leu Pro Glu Leu Asn Glu 
20 25 30 

Arg Glu Asn Tyr Lys Ser Lys Gin Lys Glu He Phe Thr Lys Gin He 
35 40 45 

Glu Phe Ser He Gin His Asn Lys Pro Leu He He His He Arg Glu 
50 55 60 

Ala Ser Phe Asp Ser Leu Asn Leu Leu Lys Asn Tyr Pro Lys Ala Phe 
65 70 75 80 

Gly Val Leu His Cys Phe Asn Ala Asp Gly Met Leu Leu Glu Leu Ser 
85 90 95 
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Asp Arg Phe Tyr Tyr Gly lie Gly Gly Val Ser Thr Phe Lys Asn Ala 

100 105 110 

Lys Arg Leu Val Glu lie Leu Pro Lys lie Pro Lys Asn Arg Leu Leu 

115 120 125 

Leu Glu Thr Asp Ser Pro Tyr Leu Thr Pro 

130 135 



<210> 36 
<211> 136 
<212> PRT 

<213> Haemophilus influenzae 
<400> 36 

Asp Ala Glu Arg Leu Leu Arg Leu Ala Gin Asp Pro Lys Val lie Ala 
15 10 15 

He Gly Glu He Gly Leu Asp Tyr Tyr Tyr Ser Ala Asp Asn Lys Ala 
20 25 30 

Ala Gin Gin Ala Val Phe Gly Ser Gin He Asp He Ala Asn Gin Leu 
35 40 45 

Asp Lys Pro Val He He His Thr Arg Ser Ala Gly Asp Asp Thr He 
50 55 60 

Ala Met Leu Arg Gin His Arg Ala Glu Lys Cys Gly Gly Val He His 
65 70 75 80 

Cys Phe Thr Glu Thr Met Glu Phe Xaa Lys Lys Ala Leu Asp Leu Gly 
85 90 95 

Phe Tyr He Ser Cys Ser Gly He Val Thr Phe Lys Asn Ala Glu Ala 
100 105 HO 

He Arg Glu Val He Arg Tyr Val Pro Met Glu Arg Leu Leu Val Glu 
115 120 125 

Thr Asp Ser Pro Tyr Leu Ala Pro 
130 135 



<210> 37 
<211> 136 
<212> PRT 

<213> Bacillus subtilis 
<400> 37 

Asp Leu Ala Trp He Lys Glu Leu Ser Ala His Glu Lys Val Val Ala 
1 5 10 15 

He Gly Glu Met Gly Leu Asp Tyr His Trp Asp Lys Ser Pro Lys Asp 
20 25 30 
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lie Gin Lys Glu Val Phe Arg Asn Gin lie Ala Leu Ala Lys Glu Val 
35 40 45 

Asn Leu Pro lie lie lie His Asn Arg Asp Ala Thr Glu Asp Val Val 
50 55 60 

Thr lie Leu Lys Glu Glu Gly Ala Glu Ala Val Gly Gly He Met His 
65 ' 70 75 80 

Cys Phe Thr Gly Ser Ala Glu Val Ala Arg Glu Cys Met Lys Met Asn 
85 90 95 

Phe Tyr Leu Ser Phe Gly Gly Pro Val Thr Phe Lys Asn Ala Lys Lys 
100 105 HO 

Pro Lys Glu Val Val Lys Glu He Pro Asn Asp Arg Leu Leu He Glu 
115 120 125 

Thr Asp Cys Pro Phe Leu Thr Pro 
130 135 



<210> 38 
<211> 135 
<212> PRT 

<213> Schizosaccharomyces pombe 
<400> 38 

Glu Ala Leu Ala Asn Lys Gly Lys Ala Ser Gly Lys Val Val Ala Phe 
1 5 10 15 

Gly Glu Phe Gly Leu Asp Tyr Asp Arg Leu His Tyr Ala Pro Ala Asp 
20 25 30 

Val Gin Lys Met Tyr Phe Glu Glu Gin Leu Lys Val Ala Val Arg Val 
35 40 45 

Gin Leu Pro Leu Phe Leu His Ser Arg Asn Ala Glu Asn Asp Phe Phe 
50 55 60 

Ala He Leu Glu Lys Tyr Leu Pro Glu Leu Pro Lys Lys Gly Val Val 
65 70 75 80 

His Ser Phe Thr Gly Ser He Asp Glu Met Arg Arg Cys He Glu His 
85 90 95 

Gly Leu Tyr Val Gly Val Asn Gly Cys Ser Leu Lys Thr Glu Glu Asn 
100 105 HO 

Leu Glu Val Val Arg Ala He Pro Leu Glu Lys Met Leu Leu Glu Thr 
115 120 125 

Asp Ala Pro Trp Cys Glu Val 
130 135 
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<210> 39 
<211> 149 
<212> PRT 

<213> Caenorhabditis elegans 



<400> 39 
His lie Ser Lys 
1 

lie Cys Val Gly 
20 

Leu Thr Thr Glu 
35 

lie Asp Leu Ala 
50 

Asp lie Ser Arg 
65 

Glu lie Leu Leu 



Ala Phe Asp Gly 
100 

Tyr Leu Phe Ser 
115 

Gin Leu lie Glu 
130 



Met Glu Gin Phe 
5 

Glu Cys Gly Leu 



Asp Phe Glu Glu 
40 

Lys His Phe Glu 
55 

Asn Val His Ser 
70 

Glu Cys His Val 
85 

Thr Pro Gly Asp 



lie Pro Pro Ser 
120 

Ser lie Pro Leu 
135 



Phe Val Glu His 
10 

Asp His Thr lie 
25 

Gin Glu Thr Val 



Lys Pro Leu lie 
60 

Arg Ser Ala Ala 
75 

Ala Pro Asp Gin 
90 

Leu Lys Leu Gly 
105 

Phe Gly Lys Ser 



Ser Gin Leu Leu 
140 



Glu Arg Asp lie 
15 

Ser Gin Phe Lys 
30 

Phe Lys Trp Gin 
45 

Leu Glu lie Pro 



Arg Arg Thr lie 
80 

Val Val Leu His 
95 

Leu Glu Ala Gly 
110 

Glu Glu Thr Thr 
125 

Leu Glu Thr Asp 



Ser Pro Ala Leu Gly 
145 



<210> 40 
<211> 139 
<212> PRT 

<213> Homo sapiens 
<400> 40 

Gin Glu Arg Asn Leu Leu Gin Ala Leu Arg His Pro Lys Ala Val Ala 
15 10 15 

Phe Gly Glu Met Gly Leu Asp Tyr Ser Tyr Lys Cys Thr Thr Pro Val 
20 25 30 

Pro Glu Gin His Lys Val Phe Glu Arg Gin Leu Gin Leu Ala Val Ser 
35 " 40 45 

Leu Lys Lys Pro Leu Val He His Cys Arg Glu Ala Asp Glu Asp Leu 
50 55 60 

Leu Glu He Met Lys Lys Phe Val Pro Pro Asp Tyr Lys He His Arg 
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65 70 75 80 

His Cys Phe Thr Gly Ser Tyr Pro Val lie Glu Pro Leu Leu Lys Tyr 
85 90 95 

Phe Pro Asn Met Ser Val Gly Phe Thr Ala Val Leu Thr Tyr Ser Ser 
100 105 110 

Ala Trp Glu Ala Arg Glu Ala Leu Arg Gin He Pro Leu Glu Arg He 
115 120 125 

He Val Glu Thr Asp Ala Pro Tyr Phe Leu Pro 
130 135 



<210> 41 

<211> 7 

<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: synthetic - 
generic organism. 

<400> 41 

Ser Arg Arg Ser Phe Leu Lys 
1 5 



<210> 42 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: synthetic - 
generic organism 

<400> 42 

Thr Arg Arg Ser Phe Leu Lys 
1 5 



<210> 43 
<211> 50 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: synthetic 
<400> 43 

Met Lys Thr Lys He Pro Asp Ala Val Leu Ala Ala Glu Val Ser Arg 
1 5 10 15 

Arg Gly Leu Val Lys Thr Thr He Ala Phe Phe Leu Ala Met Ala Ser 
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20 25 30 

Ser Ala Leu Thr Leu Pro Phe Ser Arg lie Ala His Ala Val Asp Ser 
35 40 45 

Ala He 
50 



<210> 44 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: synthetic 
<400> 44 

ttagtcggat taatcacaat gtcgatagcg 30 

<210> 45 
<211> 3120 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: synthetic 
<400> 45 

attctggctg ggtgccacca gataccaacg ttgaagagtt cgaatttgcc attcgtacgg 6 0 
tctgtgaacc tatctttgag aaaccgctgg ccgaaatttc gtttggacat gtactgttaa 120 
atctgtttaa tacggcgcgt cgcttcaata tggaagtgca gccgcaactg gtgttactcc 180 
agaaaaccct gctctacgtc gaaggggtag gacgccagct ttatccgcaa ctcgatttat 24 0 
ggaaaacggc gaagcctttc ctggagtcgt ggattaaaga tcaggtcggt attcctgcgc 3 00 
tggtgagagc atttaaagaa aaagcgccgt tctgggtcga aaaaatgcca gaactgcctg 360 
aattggttta cgacagtttg cgccagggca agtatttaca gcacagtgtt gataagattg 4 20 
cccgcgagct tcagtcaaat catgtacgtc agggacaatc gcgttatttt ctcggaattg 4 80 
gcgctacgtt agtattaagt ggcacattct tgttggtcag ccgacctgaa tgggggctga 54 0 
tgcccggctg gttaatggca ggtggtctga tcgcctggtt tgtcggttgg cgcaaaacac 600 
gctgattttt tcatcgctca aggcgggccg tgtaacgtat aatgcggctt tgtttaatca 660 
tcatctacca cagaggaaca tgtatgggtg gtatcagtat ttggcagtta ttgattattg 72 0 
ccgtcatcgt tgtactgctt tttggcacca aaaagctcgg ctccatcggt tccgatcttg 780 
gtgcgtcgat caaaggcttt aaaaaagcaa tgagcgatga tgaaccaaag caggataaaa 84 0 
ccagtcagga tgctgatttt actgcgaaaa ctatcgccga taagcaggcg gatacgaatc 90 0 
aggaacaggc taaaacagaa gacgcgaagc gccacgataa agagcaggtg taatccgtgt 96 0 
ttgatatcgg ttttagcgaa ctgctattgg tgttcatcat cggcctcgtc gttctggggc 102 0 
cgcaacgact gcctgtggcg gtaaaaacgg tagcgggctg gattcgcgcg ttgcgttcac 1080 
tggcgacaac ggtgcagaac gaactgaccc aggagttaaa actccaggag tttcaggaca 1140 
gtctgaaaaa ggttgaaaag gcgagcctca ctaacctgac gcccgaactg aaagcgtcga 1200 
tggatgaact acgccaggcc gcggagtcga tgaagcgttc ctacgttgca aacgatcctg 12 60 
aaaaggcgag cgatgaagcg cacaccatcc ataacccggt ggtgaaagat aatgaagctg 13 2 0 
cgcatgaggg cgtaacgcct gccgctgcac aaacgcaggc cagttcgccg gaacagaagc 13 80 
cagaaaccac gccagagccg gtggtaaaac ctgctgcgga cgctgaaccg aaaaccgctg 144 0 
caccttcccc ttcgtcgagt gataaaccgt aaacatgtct gtagaagata ctcaaccgct 1500 
tatcacgcat ctgattgagc tgcgtaagcg tctgctgaac tgcattatcg cggtgatcgt 156 0 
gatattcctg tgtctggtct atttcgccaa tgacatctat cacctggtat ccgcgccatt 162 0 
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gatcaagcag ttgccgcaag gttcaacgat 
tacgccgatc aagctgacct ttatggtgtc 
tcaggtgtgg gcatttatcg ccccagcgct 
gctgctggtt tccagctctc tgctgtttta 
ctttccgctg gcatttggct tccttgccaa 
cgacatcgcc agctatttaa gcttcgttat 
tgaagtgccg gtagcaattg tgctgctgtg 
acgcaaaaaa cgcccgtatg tgctggttgg 
gccggatgtc ttctcgcaaa cgctgttggc 
tgtcttcttc tcacgctttt acgttggtaa 
tgaagcagaa agcgaaaaaa ctgaagaata 
atggagtaca ggatgtttga tatcggcgtt 
cgtgatgatg ttgtagcgtg cgcttttgac 
actaacctgc gtgaaagcca gcaggcgcaa 
tcaacggcgg gcgtacatcc tcacgacagc 
attattgagc tggccgcgca gccagaagtg 
aaccgcaact tttcgacgcc ggaagagcag 
gccgcagatt taaacatgcc ggtatttatg 
acattgctgg agccgtggct ggataaactg 
acacgcgaag agatgcaggc gtgcgtggcg 
gtttgcgatg aacgacgcgg actggagctg 
aaattactga tcgaaactga tgcgccgtat 
tcatcccggc gcaacgagcc agcccatctg 
cgtggagaag atgccgcatg gctggctgcc 
gggattgcgt tttagagttt gcggaactcg 



gatcgccacc gacgtggcct cgccgttctt 1680 
gctgattctg tcagcgccgg tgattctcta 174 0 
gtataagcat gaacgtcgcc tggtggtgcc 1800 
tatcggcatg gcattcgcct actttgtggt 1860 
taccgcgccg gaaggggtgc aggtatccac 1920 
ggcgctgttt atggcgtttg gtgtctcctt 198 0 
ctggatgggg attacctcgc cagaagactt 2 040 
tgcattcgtt gtcgggatgt tgctgacgcc 210 0 
gatcccgatg tactgtctgt ttgaaatcgg 2160 
agggcgaaat cgggaagagg aaaacgacgc 2220 
aattcaaccg cccgtcaggg cggttgtcat 228 0 
aatttgacca gttcgcaatt tgcgaaagac 234 0 
gcgggagtta atgggctact catcaccggc 2400 
aagctggcgc gtcagtattc gtcctgttgg 2460 
agccagtggc aagctgcgac tgaagaagcg 252 0 
gtggcgattg gtgaatgtgg tctcgacttt 2 58 0 
gaacgcgctt ttgttgccca gctacgcatt 2640 
cactgtcgcg atgcccacga gcggtttatg 2 70 0 
cctggtgcgg ttcttcattg ctttaccggc 2 76 0 
catggaattt atatcggcat taccggttgg 2 82 0 
cgggaacttt tgccgttgat tccggcggaa 2 88 0 
ctgctccctc gcgatctcac gccaaagcca 2940 
ccccatattt tgcaacgtat tgcgcactgg 3 000 
accacggatg ctaatgccaa aacactgttt 3 060 
gtattcttca cactgtgctt aatctcttta 3120 



<210> 46 
<211> 312 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: synthetic 



<400> 46 

atgcggcttt gtttaatcat catctaccac agaggaacat gtatgggtgg tatcagtatt 6 0 

tggcagttat tgattattgc cgtcatcgtt gtactgcttt ttggcaccaa aaagctcggc 120 

tccatcggtt ccgatcttgg tgcgtcgatc aaaggcttta aaaaagcaat gagcgatgat 180 

gaaccaaagc aggataaaac cagtcaggat gctgatttta ctgcgaaaac tatcgccgat 240 

aagcaggcgg atacgaatca ggaacaggct aaaacagaag acgcgaagcg ccacgataaa 3 00 

gagcaggtgt aa 312 



<210> 47 
<211> 103 
<212> PRT 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: synthetic 
<400> 47 

Met Arg Leu Cys Leu lie lie He Tyr His Arg Gly Thr Cys Met Gly 
1 5 10 15 

Gly He Ser lie Trp Gin Leu Leu He He Ala Val He Val Val Leu 
20 25 30 
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Leu Phe Gly Thr Lys Lys Leu Gly 
35 40 

Ser lie Lys Gly Phe Lys Lys Ala 
50 55 

Asp Lys Thr Ser Gin Asp Ala Asp 
65 " 70 

Lys Gin Ala Asp Thr Asn Gin Glu 
85 

Arg His Asp Lys Glu Gin Val 
100 



36 

Ser lie Gly Ser Asp Leu Gly Ala 
45 

Met Ser Asp Asp Glu Pro Lys Gin 
60 

Phe Thr Ala Lys Thr He Ala Asp 
75 80 

Gin Ala Lys Thr Glu Asp Ala Lys 
90 95 



<210> 48 
<211> 515 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: synthetic 
<400> 48 

tgtttgatat cggttttagc gaactgctat tggtgttcat catcggcctc gtcgttctgg 60 
ggccgcaacg actgcctgtg gcggtaaaaa cggtagcggg ctggattcgc gcgttgcgtt 120 
cactggcgac aacggtgcag aacgaactga cccaggagtt aaaactccag gagtttcagg 180 
acagtctgaa aaaggttgaa aaggcgagcc tcactaacct gacgcccgaa ctgaaagcgt 240 
cgatggatga actacgccag gccgcggagt cgatgaagcg ttcctacgtt gcaaacgatc 3 00 
ctgaaaaggc gagcgatgaa gcgcacacca tccataaccc ggtggtgaaa gataatgaag 3 60 
ctgcgcatga gggcgtaacg cctgccgctg cacaaacgca ggccagttcg ccggaacaga 420 
agccagaaac cacgccagag ccggtggtaa aacctgctgc ggacgctgaa ccgaaaaccg 4 80 
ctgcaccttc cccttcgtcg agtgataaac cgtaa 515 

<210> 49 
<211> 161 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: synthetic 
<400> 49 

Val Phe Asp He Gly Phe Ser Glu Leu Leu Leu Val Phe He He Gly 
15 10 15 

Leu Val Val Leu Gly Pro Gin Arg Leu Pro Val Ala Val Lys Thr Val 
20 25 30 

Ala Gly Trp He Arg Ala Leu Arg Ser Leu Ala Thr Thr Val Gin Asn 
35 40 45 

Glu Leu Thr Gin Glu Leu Lys Leu Gin Glu Phe Gin Asp Ser Leu Lys 
50 55 60 



SUBSTITUTE SHEET (RULE 26) 



WO 99/51753 



37 



PCT/CA99/00272 



Lys Val Glu Lys Ala Ser Leu Thr Asn Leu Thr Pro Glu Leu Lys Ala 
65 70 75 80 

Ser Met Asp Glu Leu Arg Gin Ala Ala Glu Ser Met Lys Arg Ser Tyr 
85 90 95 

Val Ala Asn Asp Pro Glu Lys Ala Ser Asp Glu Ala His Thr He His 
100 105 HO 

Asn Pro Val Val Lys Asp Asn Glu Ala Ala His Glu Gly Val Thr Pro 
115 120 125 

Ala Ala Ala Gin Thr Gin Ala Ser Ser Pro Glu Gin Lys Pro Glu Thr 
130 135 140 

Thr Pro Glu Pro Val Val Lys Pro Ala Ala Asp Ala Glu Pro Lys Thr 
145 150 ~ 155 160 

Ala 



SUBSTITUTE SHEET (RULE 26) 



INTERNATIONAL SEARCH REPORT 



Int tional Application No 

PCT/CA 99/00272 



A. CLASSIFICATION OF SUBJECT MATTER , , . , 

IPC 6 C12N15/63 C12N15/31 C07K14/245 C12N15/62 C12P21/02 



According to International Patent Classification (IPC) or to both national classification and IPC 

B. FIELDS SEARCHED 

Minimum documentation searched (classification system followed by classification symbols) 

IPC 6 C12N C07K 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practical, search terms used) 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category ° Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



SETTLES, M. ETAL.: "Sec-independent 
protein translocation by the maize Hcfl06 
protein" 
SCIENCE. , 

vol. 278, 21 November 1997 (1997-11-21), 
pages 1467-1470, XP002113153 
cited in the application 
figure 4 

-/-- 



1,2 



m 



Further documents are listed in the continuation of box C. 



□ 



Patent family members are listed in annex. 



° Special categories of cited documents : 

"A" document defining the general state of the art which is not 
considered to be of particular relevance 

"E" earlier document but published on or after the international 
filing date 

"L" document which may throw doubts on priority claim(s) or 
which is cited to establish the publication date of another 
citation or other special reason (as specified) 

"O" document referring to an oral disclosure, use, exhibition or 
other means 

"P" document published prior to the international filing date but 
later than the priority date claimed 



T" later document published after the international filing date 
or priority date and not in conflict with the application but 
cited to understand the principle or theory underlying the 
invention 

"X" document of particular relevance; the claimed invention 
cannot be considered novel or cannot be considered to 
involve an inventive step when the document is taken alone 

"Y" document of particular relevance; the claimed invention 

cannot be considered to involve an inventive step when the 
document is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in the art. 

"&" document member of the same patent family 



Date of the actual completion of the international search 

24 August 1999 


Date of mailing of the international search report 

03/09/1999 


Name and mailing address of the ISA 

European Patent Office, P.B. 581 8 Patentlaan 2 
NL - 2280 HV Rijswijk 
Tel. (+31-70) 340-2040, Tx. 31 651 epo nl, 
Fax: (+31-70) 340-3016 


Authorized officer 

Andres, S 



Form PCT/ISA/210 (second sheet) (July 1992) 



page 1 of 2 



INTERNATIONAL SEARCH REPORT 



Int tional Application No 

PCT/CA 99/00272 



C.(Continuation) DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 0 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


x 


NIVIERE, V. ETAL.: "Site-directed 


6,12 




mutagenesis of the hydrogenase signal 






peptide consensus box prevents export of a 






beta-lactamase fusion protein" 






JOURNAL OF GENERAL MICROBIOLOGY, 






vol. 138, 1992, pages 2173-2183, 






XP002113154 






ISSN: 0001-2961 




A 


the whole document 


7-11, 






13-19 


A 


BERKS, B.: "A common export pathway for 


6-19 




proteins binding redox cofactors ?" 






MOLECULAR MICROBIOLOGY., 






vol. 22, 1996, pages 393-404, XP002113155 






cited in the application 






the whole document 




A 


SANTINI C L ET AL: "A novel sec - 


6-19 




independent periplasmic protein 






translocation pathway in Escherichia 






coli ." 






EMB0 JOURNAL, (1998 JAN 2) 17 (1) 101-12., 






XP002113156 






the whole document 




P,X 


WEINER J H ET AL: "A novel and ubiquitous 




system for membrane targeting and 






secretion of cof actor-containing 






proteins . " 






CELL, (1998 APR 3) 93 (1) 93-101., 






XP002113157 






the whole document 




P,X 


SARGENT F ET AL: "Overlapping functions 


1-5 


of components of a bacterial Sec - 






independent protein export pathway." 






EMBO JOURNAL, (1998 JUL 1) 17 (13) 






3640-50. , XP002113158 






the whole document 




T 


DALBEY R E ET AL: "Protein translocation 


1-5 




into and across the bacterial plasma 






membrane and the plant thylakoid membrane" 






TIBS TRENDS IN BIOCHEMICAL SCIENCES, 






vol. 24, no. 1, January 1999 (1999-01), 






page 17-22 XP004155514 






ISSN: 0968-0004 





Form PCT/ISA/210 (continuation of second sheet) (July 1992) 



page 2 of 2 



